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An  analog  signal  needs  to  be  quantized  and  encoded  into  digital  form  before  it  is 
transmitted  via  a digital  channel.  A frequently  used  scheme  is  the  differential  pulse- 
code  modulation  (DPCM)  which  predicts  the  current  value  by  the  previous  data, 
and  consequently  only  prediction  residuals  instead  of  the  original  data  set  need  to 
be  transmitted.  Usually,  DPCM  reduces  signal  redundancy  by  using  an  AR(p)  time 
series  model  to  describe  the  data  and  to  perform  the  prediction.  Due  to  the  nonsta- 
tionary nature  of  signals,  the  parameters  in  the  AR(p)  model  need  to  be  adjusted 
from  time  to  time.  Several  adaptive  DPCM  schemes  have  been  created  to  deal  with 
this  situation.  A different  approach  motivated  by  Bayesian  analysis  is  proposed;  the 
parameters  are  treated  as  random  variables  and  a prior  distribution  for  the  parame- 
ters is  assumed  to  be  known.  For  fixed  channel  capacity,  the  optimal  design  of  the  bit 
allocation  for  transmitting  the  parameters  and  the  prediction  residuals  is  obtained 
when  mean  squared  quantization  error  is  used  as  the  criterion  of  the  loss  function. 
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CHAPTER  1 


INTRODUCTION 


1.1  Quantization  of  Continuous  Signals 

With  the  continuing  growth  of  modern  communications  technology,  demand  for 
efficient  signal  transmission  and  storage  increases  rapidly.  These  practical  needs  lead 
to  a significant  development  in  theoretical  and  technical  research  in  the  related  fields. 
In  recent  years,  more  and  more  attention  has  been  concentrated  on  improving  the 
pace  and  efficiency  in  signal  processing  and  a lot  of  research  has  been  conducted 
in  this  area.  How  successfully  a communication  system  works  depends  both  on  its 
“hardware”  and  its  “software.”  Data  compression,  concerned  with  minimizing  the 
number  of  information  carrying  units  used  to  represent  the  signals,  is  just  one  of  the 
“software”  playing  an  important  role  in  the  operation  of  a communication  system. 

Signals  we  are  dealing  with  often  appear  in  the  analog  form  and  they  cannot  be 
directly  input  into  digital  devices.  Suppose  x is  a single  sample  from  a continuous 
population  with  probability  density  function  f(x)  defined  on  a domain  R.  Since  x 
can  take  an  infinite  number  of  values  within  a continuous  range,  it  is  impossible  to 
transmit  the  original  observation  directly  via  a digital  channel  due  to  the  limitation 
of  the  channel  capacity.  The  data  have  to  be  compressed  by  a quantization  procedure 
in  order  to  pass  a finite  capacity  digital  channel.  The  general  practice  is  to  sample 
the  continuous  signal  at  equally  spaced  intervals  of  time,  and  then  to  digitize  each  ob- 
servation to  one  of  several  levels  predetermined  by  certain  rules.  Signal  quantization 
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provides  a digital  representation  for  the  analog  signals.  Much  of  the  reason  for  this 
use  of  digital  samples  of  the  analog  signal  is  that  they  can  be  stored,  manipulated 
and  transmitted  more  reliably  than  can  the  analog  signal  [1].  Nowadays,  more  and 
more  digital  facilities,  such  as  digital  computers  and  digital  channels,  are  used  in 
signal  processing  and  have  provided  a powerful  means  for  transmitting  and  storing 
digital  signals  in  a flexible  and  reliable  manner. 

Binary  electrical  devices  are  widely  used  in  modern  communication.  The  binary 
system  represents  information  by  only  two  states  which  are  usually  named  as  “0” 
and  “1”.  One  basic  information  carrying  unit  in  the  binary  system,  which  is  called 
a“bit”,  provides  two  stable  states  which  can  be  used  to  represent  one  digit  in  the 
binary  system.  So  with  m bits  available,  we  can  represent  2m  binary  numbers. 

An  analog  signal  can  be  considered  as  a realization  from  a continuous-time  stochas- 
tic process  with  continuous  state  space.  The  determination  of  the  optimal  quantized 
values  for  each  observation  depends  on  its  statistical  behavior. 

In  mathematically  modeling  the  quantization  process,  we  define  a function  of  the 
variable  x as 

x = qk  if  x G Qk  (1-1) 

where  k — 1, 2, . . . , v,  {g*,}  is  a predetermined  real  sequence  and  {Qt}  is  a partition  of 
the  range  R.  The  function  x defined  by  (1.1)  is  called  a quantizer  of  x and  v is  called 
the  quantization  level  which  is  determined  by  the  number  of  bits  used  to  quantize  x. 

Since  it  is  the  quantized  values  that  are  actually  encoded  and  transmitted  through 
the  channel,  it  is  impossible  to  restore  the  original  signal  exactly.  The  difference 
e = x — x is  called  quantization  error.  The  distortion  of  a quantizer  is  defined  as  the 
expected  value  of  some  function  of  the  quantization  error,  i.e., 


D = E[g(xi  - Xi)]. 
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Among  a variety  of  distortion  functions,  the  mean  squared  error,  with  g(x)  = x 2, 
is  the  most  frequently  used  criterion  to  evaluate  the  performance  of  a quantization 
scheme. 


1.2  Classification  of  the  Quantization  Schemes 

There  has  been  extensive  research  directed  toward  signal  quantization,  and  as  a 
result,  a large  variety  of  quantization  schemes  have  been  developed.  According  to 
the  ways  of  handling  the  quantizer  input,  these  schemes  can  be  classified  into  three 
basically  different  categories  [2,  3]. 

The  first  category  is  known  as  pulse-code  modulation  (PCM)  [4,  5,  6].  In  PCM, 
the  sample  from  the  original  signal  is  quantized  directly  without  any  previous  manip- 
ulation. The  output  of  the  quantizer  is  an  amplitude  discrete  sample  which  will  be 
transmitted  through  the  channel.  The  advantages  of  PCM  are  its  simplicity  and  easy 
realization.  But  there  is  often  some  kind  of  redundancy  in  the  signal  which  is  carried 
on  in  the  quantization  and  transmission  processes  if  no  measure  is  taken  to  remove 
it.  So  PCM  schemes  will  produce  larger  quantization  error  and  cost  more  channel 
capacity  compared  with  some  advanced  schemes  [7,  8].  Despite  the  disadvantage, 
PCM  does  lay  down  a basis  for  all  other  kinds  of  schemes. 

The  second  category  is  called  differential  pulse-code  modulation  (DPCM)  [9,  10, 
11].  DPCM  uses  the  sample  intercorrelation  to  reduce  the  redundancy  carried  by 
the  original  source  signals.  In  the  DPCM  system,  the  current  value  is  predicted 
by  the  previous  observations,  so  the  sample  {x*}  can  be  represented  by  a model 
which  consists  of  some  parameters  plus  a residual  series  {e*}.  Instead  of  encoding  the 
observations  themselves,  the  prediction  residuals  are  encoded  and  then  transmitted 
via  the  channel.  Since  the  key  function  is  played  by  the  predictor,  DPCM  is  also 
called  predictive  quantization.  Due  to  the  reduction  of  the  redundancy  by  DPCM, 
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the  signal  can  be  transmitted  more  efficiently  than  PCM.  DPCM  can  be  viewed  as 
a scheme  that  adopts  good  features  from  regular  PCM  and  so  is  superior  to  it  in 
performance. 

The  third  category  contains  all  kinds  of  transform  quantization  schemes  [12,  13]. 
In  these  schemes,  the  sample  is  divided  into  data  blocks  and  a transformation  is  per- 
formed in  each  block.  The  redundancy  is  reduced  either  by  an  information  preserving 
transformation  to  yield  a new  block  containing  a minimum  number  of  observations 
[14],  or  by  a decorrelating  transformation  to  make  the  new  block  uncorrelated  [15]. 
As  they  treat  a group  of  data  as  a whole,  the  schemes  in  this  category  are  also  called 
block  quantization  or  vector  quantization. 

It  should  be  pointed  out  that  some  other  quantization  schemes  also  exist.  They 
may  use  a combination  of  the  schemes  from  the  above  three  categories. 

1.3  The  Bayesian  Approach 

In  practice,  the  signals  we  deal  with  are  often  nonstationary.  The  conventional 
time-invariant  schemes  may  lose  their  efficiency  in  handling  nonstationary  signals  if 
no  adjustment  is  made.  A special  but  very  common  type  of  nonstationarity  is  caused 
by  the  change  of  the  signal  features  which  decide  the  statistical  pattern  of  the  signal. 
This  type  of  nonstationarity  can  be  described  by  a model  governed  by  a set  of  varying 
parameters  and  is  considered  in  this  paper. 

Many  kinds  of  adaptive  quantization  schemes  have  been  created  targeting  differ- 
ent adaptation  objects.  In  PCM,  the  size  of  the  quantization  intervals  are  adapted  to 
match  the  signal  variance.  The  idea  is  to  modify  the  interval  sizes  by  a factor  depend- 
ing on  the  statistical  nature  of  the  previous  samples  and  use  the  modified  intervals  to 
quantize  the  next  observation  [16,  17].  In  DPCM,  the  parameters  are  also  adapted 
periodically  in  addition  to  modify  the  quantizer  [18,  19,  20].  Once  the  parameters 
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are  updated  by  the  current  observations,  the  new  parameters  will  be  used  to  perform 
further  predictions.  In  transform  quantization,  the  adaptation  is  often  made  either 
to  the  quantizer  to  fit  the  statistical  nature  of  the  local  block  or  to  the  design  of  the 
bit  allocation  to  improve  the  quantization  efficiency. 

In  this  dessertation,  a new  approach  to  quantize  nonstationary  signals  is  con- 
ducted by  combining  Bayesian  principle  with  the  techniques  used  in  PCM,  DPCM 
and  transform  quantization.  We  call  it  the  Bayesian  approach.  The  sample  is  assumed 
to  come  from  an  autoregressive  time  series  with  varying  parameters.  To  overcome 
the  difficulty  brought  by  the  nonstationarity,  we  divide  the  whole  process  into  blocks 
with  the  same  size,  and  stationarity  is  assumed  in  each  block;  i.e.,  each  block  forms 
an  autoregressive  process  with  fixed  parameters.  This  is  a reasonable  assumption  if 
the  sampling  frequency  is  large  and  the  block  is  relatively  small. 

Inside  each  block,  a modified  nonadaptive  DPCM  procedure  is  applied  based  on 
the  fixed  parameter  model.  In  a conventional  DPCM  system,  the  parameters  are  first 
estimated  on  the  transmitter  side  and  the  estimate  is  used  to  predict  the  future  values 
of  the  sample.  The  difference  between  the  observations  and  their  predicted  values  are 
quantized  and  transmitted  to  the  receiver.  Since  no  bookkeeping  information  about 
the  parameters  is  passed  over,  the  parameters  need  to  be  estimated  again  on  the 
receiver  side  in  order  to  recover  the  original  signal.  Though  the  two  estimations  are 
made  based  on  the  same  formula,  the  second  estimation  may  contain  bias  because  the 
quantized  data  are  used  on  the  receiver  side.  More  seriously,  the  biased  estimator  may 
cause  further  prediction  error  which  may  accumulate.  It  means  that  any  distortion  at 
the  receiver  may  be  perpetuated  as  an  error  in  all  future  values  of  the  reconstructed 
signal.  In  our  approach,  attempt  is  made  to  avoid  the  error  accumulation  in  the 
signal  reconstruction  process  by  transmitting  the  prediction  residuals  as  well  as  the 
quantized  parameters.  Motivated  by  the  principle  of  the  Bayesian  analysis  [21],  we 
treat  the  varying  parameters  as  random  variables  and  assigned  them  a joint  prior 
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density  function  based  on  the  previous  knowledge  about  the  process.  Since  only  the 
quantized  values  of  the  parameters  are  available  for  the  signal  reconstruction,  it  is 
advantageous  to  use  the  quantized  parameters  for  prediction  on  the  transmitter  side. 
The  whole  procedure  can  be  stated  as  follows.  We  quantize  the  parameters  and  then 
use  the  quantized  parameter  to  perform  prediction.  The  quantized  values  of  the 
parameters  are  transmitted  to  the  receiver  as  well  as  the  quantized  residuals.  The 
predictor  on  the  receiver  side  will  work  on  these  values  to  recover  the  original  signal. 
If  the  transmission  channel  is  errorless,  the  distortion  of  the  recovered  signal  is  just 
the  quantization  errors  of  the  residuals  which  are  generally  quite  small. 

In  the  new  scheme,  the  quantization  of  the  residuals  is  closely  related  to  that 
of  the  parameters.  High  level-quantized  parameters  will  reduce  the  variance  of  the 
residuals  and  so  increase  the  accuracy  of  the  prediction.  On  the  other  hand,  high 
level  parameter  quantization  will  cost  more  channel  capacity  and  leave  fewer  bits 
available  for  quantizing  the  residuals.  To  find  the  optimal  bit  allocation,  we  need 
to  examine  how  seriously  the  reconstruction  is  affected  by  the  quantization  of  the 
parameters.  Since  the  blocks  are  treated  independently  in  our  scheme,  there  is  at 
least  one  nonpredictable  observation  in  each  block  that  has  no  previous  information. 
Thus,  it  is  unwise  to  distribute  the  bits  equally  among  all  the  observations  without 
considering  the  difference  in  predictability.  Therefore,  the  bits  allocation  among  the 
parameters,  predictable  and  nonpredictable  observations  needs  to  be  determined  in 
order  to  obtain  the  minimum  distortion. 

As  pointed  out  above,  the  quantized  parameters  are  used  in  our  prediction  proce- 
dure, so  the  residuals  are  no  longer  uncorrelated.  Under  this  circumstance,  it  seems 
difficult  to  obtain  the  exact  theoretical  result.  In  this  dessertation,  an  effort  is  made 
to  get  a close  theoretical  approximation  for  the  residuals  based  on  their  limiting  be- 
havior as  the  quantization  levels  tend  to  infinity.  Simulation  is  performed  to  verify 
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the  closeness  of  this  approximation  when  the  quantization  levels  are  small.  Com- 
parison is  made  with  the  conventional  DPCM  scheme  and  another  adaptive  DPCM 
scheme. 


CHAPTER  2 


QUANTIZATION  SCHEMES 


2.1  Introduction 


As  mentioned  in  the  previous  chapter,  the  variety  of  the  quantization  schemes 
can  be  generally  classified  into  three  categories,  namely,  PCM,  DPCM  and  trans- 
form quantization  according  how  they  dispose  of  signals  before  inputing  them  to  the 
quantizer. 

PCM  uses  the  original  sample  as  the  quantization  input  and  obtains  a digital  rep- 
resentation from  the  quantization  output.  The  research  in  this  field  was  concentrated 
on  finding  the  optimal  quantizer  to  minimize  certain  type  of  distortion  measure.  In 
2.2,  an  optimal  quantizer,  in  the  sense  of  minimizing  the  mean  squared  quantization 
error,  is  introduced  and  its  properties  are  discussed.  Except  variables  with  uniform 
distribution,  the  functional  form  of  the  quantizer  is  often  too  complicated  to  be  ex- 
pressed analytically.  An  iterative  logarithm  is  suggested  to  obtain  the  numerical 
solution.  It  is  also  desirable  to  express  the  quantization  error  as  a function  of  the 
quantization  levels  so  that  further  analysis  can  be  conducted.  An  approximation 
based  on  large  quantization  level  is  derived  in  this  section. 

DPCM  makes  an  effort  to  improve  the  prediction  as  well  as  to  optimize  the  quan- 
tization. The  two  main  procedures  of  DPCM,  namely  prediction  and  quantization, 
are  briefly  discussed  in  2.3.  In  DPCM,  a statistical  model  is  needed  to  perform  the 
prediction.  The  widely  used  autoregressive  model  is  described  and  the  estimation 
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and  prediction  with  this  model  are  considered.  After  the  prediction  residuals  are  ob- 
tained, the  results  obtained  from  PCM  can  be  applied  in  the  quantization  procedure 
in  DPCM. 

Transform  quantization  is  an  alternative  to  DPCM  to  improve  the  quantization 
input.  Two  main  concerns  in  this  scheme  are  the  selection  of  transform  matrix  and  the 
allocation  of  the  bits.  Both  aspects  are  considered  in  2.4.  Several  transform  methods 
are  given  and  their  optimality  and  complexity  are  discussed.  The  bit  allocation 
problem  arises  as  the  components  of  the  transformed  sample  may  not  have  identical 
variance.  An  optimal  design  for  the  bit  allocation  is  presented  in  this  section. 

All  schemes  mentioned  above  can  be  made  adaptive  to  fit  nonstationary  samples. 
In  DPCM,  either  the  predictor  or  the  quantizer  can  be  updated  by  using  the  newly 
observed  data.  In  transform  quantization,  the  adaptation  is  achieved  by  the  adjust- 
ment of  the  transform  matrix  or  by  changing  the  bit  assignments  for  different  blocks. 
Adaptive  quantization  is  the  topic  discussed  in  2.5. 

The  content  reviewed  in  this  chapter  forms  the  theoretical  basis  for  our  new 
approach. 


2.2  Minimum  Mean  Square  Error  Quantization 

PCM  is  a simple  quantization  scheme  in  digital  transmission  but  it  provides  some 
necessary  means  and  techniques  to  other  more  comprehensive  schemes.  In  their  early 
paper  in  1948,  Oliver,  Pierce  and  Shannon  elaborated  the  philosophy  of  PCM  and 
pointed  out  the  differences  between  what  can  be  achieved  with  PCM  and  with  some 
other  previous  applied  systems  [4]. 

As  mentioned  previously,  PCM  is  a scheme  which  gives  a discrete  digital  repre- 
sentation to  the  analog  signals.  Since  the  original  signal  cannot  be  fully  recovered 
from  its  quantized  values  [22],  great  effort  has  been  made  to  find  an  optimal  scheme 
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to  minimize  the  distortion.  In  our  approach,  the  mean  square  error  (MSE)  is  used  as 
the  criterion  to  compare  different  schemes. 

Minimum  mean  squared  error  quantizer 

In  the  early  sixties,  Lloyd  [23]  and  Max  [5]  individually  solved  the  problem  of 
optimal  quantization  for  operation  on  a single  sample  of  a random  process  with  some 
specified  probability  density. 

Let  X be  a random  variable  with  density  function  /(x)  defined  on  the  domain  R 
and  x be  an  observation.  To  obtain  the  optimal  quantization  scheme  which  minimizes 
the  MSE  for  a given  quantization  level  v,  one  just  needs  to  find  a partition  {Qk}  and 
the  corresponding  quantization  values  {qk}  such  that 

D = E[{X-X )2] 

= 2 / (x  “ <lk)2f(x)dx  (2.1) 

k=i  */<?<, 


is  minimized. 

We  now  consider  the  minimization  problem  in  two  steps.  First,  we  assume  that 
the  partition  {Qk}  has  already  been  chosen.  A well  known  result  in  statistics  tells  us 
that 


/ (x  - qk)2f(x)dx 
JQh 

attains  its  minimum  if  and  only  if  qk  = E(X\X  € Qk)-  This  is  to  say,  the  best  choice 
of  {<7k}  is  just  the  center  of  the  mass  of  X in  {Qk}  with  density  /(x). 

Next  consider  the  problem  of  finding  the  best  {Qk}  given  that  the  values  {^k} 
are  fixed.  Without  loss  of  generality,  assume  qi  < q2  < . . . < qv.  Since  (x  — qk)2  has 
positive  weight  /(x)  in  the  expression  of  the  mean  squared  quantization  error,  it  is 
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natural  that  any  value  closer  to  ft  than  to  any  other  quantization  value  should  be 
assigned  to  Qk-  Ignoring  a subset  with  zero  probability,  Qk  should  be  chosen  as 


Qk  = {x:(x-  qk )2  <(x-  ft)2, for  all  j ^ k}. 

The  boundary  point  can  be  assigned  to  either  of  the  two  sets  it  separates  without 
affecting  the  mean  square  quantization  error. 

It  is  straightforward  that  {Qk},  the  partition  of  R , should  be  intervals  whose 
endpoints,  say  c*, , k = 0, 1, . . . , v,  bisect  the  segments  between  successive  {ft},  except 
Co  and  ft  which  can  be  any  values  such  that  — oo  < cq  < inf(x  : x 6 R)  and 
sup(x  ■;  x £ R)  < ft  < oo.  So  the  best  partition  {Qk}  should  be  a selection  of 
intervals  with  endpoints  {ft}.  By  convention,  we  assign  the  boundary  point  to  the 
interval  on  its  left,  i.e. 


{Qfc}  = {(ft-i,ft]}- 


From  the  above  discussion,  we  have  obtained  the  following  necessary  condition 
for  {Qk}  and  {ft}  to  be  optimal. 


’ (*  - qk)f(x)dx  = 0 

. — (ft  ft+i)/2  k — 1, 2, . . . , v. 


(2.2) 


The  above  equations  can  also  be  obtained  by  differentiating  the  total  mean  square 
quantization  error  D defined  in  (2.1)  with  respect  to  {ft}  and  {ft},  then  setting  the 
derivatives  to  zero. 
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The  sufficient  condition  is  that  the  Hessen  matrix  of  E[(X  — X)2]  is  positive 
definite. 

The  L-M  quantizer  has  the  following  properties  which  can  be  easily  proved  [24]. 

(1)  E(X)  = E(X). 

(2)  E(XX)  = E(X2). 

(3)  Cov(X,X-X)>  0. 

The  quantized  values  and  the  end  points  of  the  quantization  intervals  for  the 
standard  normal  distribution  is  given  by  Max  [5],  and  they  were  also  obtained  for 
the  gamma  and  Laplacian  distribution  [25]. 

A numerical  algorithm 

Because  of  the  complicated  functional  relationships  which  are  likely  to  be  induced 
by  f(x),  the  analytic  solution  of  the  simultaneous  equations  (2.2)  may  not  be  easily 
obtained.  Lloyd  and  Max  introduced  a numerical  method  to  solve  this  problem  [5,  23]. 

Choose  — oo  = < . . . < = oo  arbitrarily  as  the  initial  values 

and  then  let  { 1 \ i = 1,2,  ...,v}  be  the  center  of  the  mass  of  x in  (cj-i>ct^]-  By 
the  above  discussion,  the  optimal  choice  of  {q}  should  be  the  center  points  of  the 
intervals  {(<Zi,9i+i]}  which  in  general  may  not  be  satisfied  by  the  arbitrary  choice  of 
{c^}.  So  the  second  trial  will  choose  that 


for 


1, 2, . . . , v — 1 


«S2)  = £[x|cSi,1  < x < 42>j. 


and  correspondingly 
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Note  that  the  values  for  Co  and  Cy  remain  unchanged  throughout  the  whole  pro- 
cedure. 

The  values  of  {cj2^}  and  {gt-2^}  are  obtained  as  the  result  of  imposing  the  center 
point  condition  and  center  of  mass  condition  again.  It  is  obvious  that  the  resulting 
mean  squared  error  after  the  second  step  of  the  iteration  will  become  smaller,  i.e., 

MSE(2)  < MSE(1>. 

Continuing  this  process,  we  get  a sequence  of  values  for  the  MSE.  After  the  mth  step, 
we  have 

MSE(m)  < MSE(m-a>  < . . . < MSE(1). 

Since  it  is  a nonnegative  and  decreasing  sequence,  the  limit  of  {MSE^}  exists 

as  m tends  to  infinity  [26],  and  consequently,  the  limits  of  and  should 

possess  the  optimal  property. 

Some  approximations 

Though  numerical  method  is  available,  it  is  still  desirable  to  get  a close  form 
expression  for  {c*}  and  {qk}-  An  analytical  approximations  for  {c^}  and  {<7;}  was 
obtained  by  Roe  [27].  When  X has  a standard  normal  distribution, 


Cfc  = y/6  erf  1 


/ 2k  — v \ 
[v  + 0.8532 ) ’ 


(2.3) 


where  erf  1 is  the  inverse  error  function. 

Once  the  endpoints  of  the  quantization  intervals  are  obtained,  the  approximated 
values  of  {^}  can  be  obtained  from  (2.2). 

Though  the  approximation  is  based  on  large  quantization  level,  it  works  quite 
well  even  for  moderate  value  of  v.  For  the  case  of  v = 6,  the  individual  values  of  ck 
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calculated  from  (2.3)  are  within  3 percent  of  the  correct  values.  For  v = 36,  they  are 
within  0.5  percent  and  the  resulting  quantization  error  is  undistinguishable  from  the 
true  error. 


approximately,  where  k(v)  is  some  function  of  v. 

An  approximation  for  the  quantization  error,  when  v is  large  enough,  was  given 
by  Panter  and  Dite  [28].  If  the  quantization  level  v is  large,  f{x)  ~ /(?i)  for  x E 
Qi,i  = 1,2,  ...,v  and  qi  is  almost  the  center  point  of  the  interval  (cj_i,c<].  Let 
A i = qi-  Ci-i  = a-  qi,  then 


It  is  helpful  to  obtain  the  relationship  between  the  MSE  and  the  quantization 
level  for  studying  the  error  rate  . Max  showed  by  a graphing  argument  that 


E[(X  - X )2]  ~ k(v)v~2 


E[(X  - X)2] 


- £ - 1'?  + (?.  - Ci-l)3] 

t = l 6 


= ! E/ta)  A? 

13  i=l 


(2.4) 


where  ut  = /s(^)Ai-  Hence, 
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[ fhx)dx=c,  (2.5) 

i=l  1 

where  C is  some  constant. 

This  problem  is  now  reduced  to  minimizing  the  sum  of  cubes  subject  to  the 
condition  that  the  sum  of  the  variables  is  a constant.  From  Lagrange’s  method  of 
undetermined  multipliers,  it  follows  that,  subject  to  (2.5),  the  expression  (2.4)  is 
minimized  when 


c 

U\  = X l2  — . . . = uv  = —. 

V 


Therefore, 


E[(X  - Xf] 


1 

12u2 


(2.6) 


This  formula  gives  an  explicit  analytic  approximation  for  the  MSE  as  a function 
of  the  quantization  level  v.  Lloyd  obtained  the  same  result  from  a different  approach 
[23].  Lloyd  also  proved  that  the  difference  between  the  approximation  and  the  MSE 
has  a higher  order  of  magnitude  than  v~2,  i.e. 


E{(X  - *)J] 


(2.7) 


It  is  trivial  that  a higher  quantization  level  will  produce  a smaller  mean  squared 
error.  When  the  quantization  level  v is  given,  the  lower  bound  for  the  mean  squared 
quantization  error  for  any  quantizer  is  given  by  [6] 


The  above  lower  bound  can  be  used  to  evaluate  the  performance  of  any  quantizer. 
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2.3  Prediction  and  Quantization  in  DPCM 

The  operation  of  a typical  DPCM  system  can  be  described  as  follows.  A predic- 
tor on  the  transmitter  side  works  on  the  previous  data  and  produces  a prediction  for 
the  present  value.  The  difference  between  the  true  value  and  the  prediction  is  then 
quantized  and  transmitted  to  the  receiver.  On  the  receiver  side,  another  predictor 
works  synchronously  with  the  one  on  the  transmitter  side.  The  prediction  generated 
by  this  predictor  is  added  to  the  received  difference  to  reconstruct  the  original  signal. 
The  variance  of  the  differences  input  to  the  quantizer  is  generally  smaller  than  that 
of  the  original  signal  because  the  redundancy  concealed  in  the  sample  has  been  re- 
moved. Therefore,  it  provides  a possible  way  to  save  the  bandwidth  at  an  acceptable 
distortion  rate  or  obtain  smaller  distortion  at  fixed  channel  capacity.  DPCM  is  more 
powerful  and  more  efficient  to  quantize  and  transmit  highly  correlated  signals  than 
the  PCM  scheme  since  we  have  taken  advantage  of  the  knowledge  about  intersample 
correlation  in  the  prediction  procedure.  This  is  the  key  point  making  DPCM  superior 
to  PCM. 

DPCM  was  first  revealed  by  Cutler  of  the  Bell  Telephone  Laboratory  in  his  orig- 
inal patent  in  1952  [29].  Since  then,  much  research  has  been  conducted  to  improve 
and  consummate  this  scheme.  Elias  discussed  this  scheme  in  detail  and  suggested 
some  criterion  for  the  optimal  predictor  [10].  Goldstein  and  Liu  found  the  joint  dis- 
tribution of  the  quantization  error  and  the  step-size  of  the  uniform  quantizer  adopted 
in  the  system  based  on  the  assumption  that  the  sample  forms  a first  order  Markov 
sequence  [30].  Arnstein  formulated  the  quantization  error  by  a difference  equation 
and  suggested  an  iterative  algorithm  to  calculate  the  numerical  result  [31]. 

In  DPCM,  the  sample  needs  to  be  modeled  in  order  to  perform  predictions.  The 
model  used  by  DPCM  is  an  autoregressive  process,  i.e.,  a time-variate  process  with 
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each  observation  drawn  from  some  probability  distribution  conditioned  on  the  previ- 
ous observations.  When  the  current  value  is  a linear  combination  of  the  previous  p 
observations  plus  some  white  noise,  it  is  called  a pth  order  autoregressive  time  series 
(AR(p))  [32,  33].  Practical  experience  has  shown  that  the  AR(p)  model  fits  a large 
variety  of  data  very  well.  Much  of  the  literature  about  coding  and  quantization  is 
concentrated  on  this  model  [18,  34,  35]. 

Consider  a stationary  autoregressive  sequence  {x^,  ±i  = 0,1,2,.. .}  with  common 
mean  zero  and  variance  o\.  An  AR(p)  model  is  defined  by 

X{  — -f-  (j>2Xi— 2 4"  • • • 4"  $pXi—p  4-  £»,  (2-8) 

where  {e;}  is  the  white  noise  sequence. 

In  practice,  {x;}  should  be  able  to  be  written  as  a function  of  the  previous  noise 
sequence  in  order  to  have  a realistic  physical  interpretation  [33].  For  a process  to  pos- 
sess this  property,  the  parameters  {(f) j}  should  satisfy  the  condition  that  the  complex 
equation 

1 - (faz  4-  foz2  4- ...  4-  <f>pzp ) = 0, 

has  no  solution  z on  or  inside  the  unit  circle. The  processes  satisfying  this  condition 
are  called  causal. 

The  lag  h autocovariance  and  correlation  functions  are  defined  by 

7 h = Cov(Xi,Xi+h ) 

and 

_ Cov(Xt,Xt+h)  lh 

Ph  Var(Xt)  7o 

respectively.  The  most  frequently  used  estimators  for  7^  and.  ph  are  the  sample 
covariance  and  sample  correlation  given  by 
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n i= 1 

and 

7fc 

ph  = — 

7o 

where  X = ^ S"=1  X;  is  the  sample  mean. 

An  estimate  for  the  parameters  can  be  obtained  by  substituting  the  sample  cor- 
relation for  the  true  correlation  in  the  famous  Yule- Walker  equation  [33] 

<*>  = P'V, 

where 

$ = fa,  ■ • ■ , , 

9 = {fa,  fa,...,  fa)' 

and 


/ 1 

fa 

fa 

” fa-l\ 

A 

P\ 

1 

fa  • 

’ * Pp- 2 

fa 

fa 

1 

' ' Pp- 3 

\Pp- 1 

fa- 2 

Pp- 3 • 

*•  1 / 

We  now  consider  the  prediction  problem  based  on  the  model  we  have  established. 
In  view  of  the  demand  in  DPCM,  we  are  interested  only  in  one  step  ahead  predic- 
tion. That  is,  we  want  to  predict  the  value  of  i„+i  based  on  our  knowledge  about 
{x„,  xn_i, . . .}.  It  is  well  known  that  the  minimum  MSE  predictor  [33]  is 


Xn+1  = E[Xn+1  |Yn, 
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Recall  the  Markov  property  that  the  distribution  of  Xn+\  depends  only  on  the  p 
preceding  observations.  So  the  MSE  predictor  for  the  AR(p)  model  is 


An+1  — E[Xn+l\Xn,Xn_i,  . . . , .Xn-p+i] 


= <t> lXn  + <f> 2-^n_i  + * ••  + (f)pXn-p+ 1.  (2-9) 


The  prediction  residual  is  given  by 


Xn+ i — ATn+i 

= An+l  — (4>\Xn  + (f) 2-Xn_i  + • * * + <J>pXn-p+i) 

= Cn+L  (2-10) 

which  is  the  noise  term  at  lag  n + 1.  This  means  that  all  the  redundancy  has  been 
removed  and  only  the  uncontrollable  random  factor  is  left. 

Note  that 


V ar(Xn+ 1 ) — V ar(c/)iXn  -f  <^>2-ATn— i + • • • + <f>PXn  -p+l  + en+l) 

— V ar(<^l  ATn  + ^2-Xn-l  + • • • + <f>pXn-p+i)  + o\ 

> 


The  difference  between  V ar(Xn+i)  and  a \ is  generally  significant,  specially  if  the 
process  has  highly  positive  correlations  among  the  successive  observations.  A precise 
relation  between  the  two  variances  is 


(1  — $\P\  — <t>2P2  — 4>pPp ) 


Var(Xi)  = 
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This  tells  us  the  improvement  of  the  quantizer  input  for  the  DPCM  system  compared 
to  that  for  the  PCM  system. 

Since  the  prediction  residuals  are  statistically  uncorrelated,  nothing  can  be  done 
for  further  improvement  in  modeling.  What  is  needed  now  is  to  choose  a quantizer 
working  on  the  prediction  errors.  From  the  statistical  point  of  view,  the  optimal 
choice  is  the  L-M  quantizer  as  mentioned  previously.  When  a large  number  of  bits 
are  available,  an  alternative  choice,  without  much  loss  in  distortion,  is  the  unform 
quantizer  with  equal  size  quantization  intervals  except  the  first  and  the  last  one. 
Other  quantizers  are  also  used  in  DPCM  [36]. 

Some  other  quantization  schemes  were  created  under  criteria  other  than  the  MSE. 
O’Neal  suggested  a scheme  aimed  at  maximizing  the  sample  entropy  and  showed  that 
the  entropy  quantization  can  increase  the  signal-to-noise  ratios  [11].  Some  more  work 
was  done  for  getting  better  subjective  signal  qualily  [37,  38]. 

Research  has  been  conducted  to  reveal  the  nature  of  the  quantization  errors  com- 
ing from  different  types  of  quantizers.  Due  to  the  nonlinear  nature  of  the  quantizers, 
the  exact  analysis  of  the  quantization  error  seems  hard  to  carry  on  except  for  in 
some  special  cases.  Most  works  reported  in  the  literature  are  using  some  kind  of 
approximation  based  upon  the  limiting  behavior. 


2.4  Decorrelating  Transformation  and  Bit  Allocation 

An  alternative  approach  to  handle  the  redundancy  in  the  sample  is  the  transform 
quantization  [12,  39,  40].  Because  it  deals  with  the  quantizations  of  a group  of 
observations  at  the  same  time  rather  than  quantizing  them  individually,  the  optimal 
block  quantizer  refers  to  the  one  minimizing  the  total  mean  squared  error  of  the  whole 
block  with  fixed  total  number  of  bits.  The  transformation  removes  the  redundancy  by 
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transforming  one  block  to  another  either  with  uncorrelated  components  [15]  or  with 
fewer  number  of  components  for  quantization  [14].  Then,  a corresponding  inverse 
transformation  is  performed  on  the  receiver  side  to  restore  the  original  block. 

Linear  transformation  is  most  frequently  used  for  decorrelating  the  sample  [15,  41]. 
Assume  X = (xi,  x2, . . . , xn)'  is  a random  vector  with  mean  vector  0 and  covariance 
matrix  S.  The  vector  X is  linearly  transformed  by  a n x n matrix  A to  produce 
another  vector  Y with  uncorrelated  components  which  are  quantized  with  a chosen 
quantizer  and  then  transmitted  via  the  channel.  Another  linear  transformation  on 
the  receiver  side  by  an  n x n matrix  B is  performed  to  yield  the  vector  X as  the 
reconstructed  version  of  X.  The  problem  is  to  choose  the  matrices  A and  B such 
that 


D = ££[(*.  - *02] 

»=1 

attains  its  minimum. 

Huang  and  Schultheiss  studied  this  problem  and  obtained  the  optimal  choice  of 
the  matrices  A and  B [15].  Suppose  A is  chosen,  the  optimal  B is  simply  the  inverse 
matrix  of  A,  i.e.,  B = A-1.  The  best  choice  of  the  transform  matrix  A satisfies 

AT. A'  = diag(Ai,  A2) . . . , An), 

where  diag  denotes  the  diagonal  matrix  and  Ai,  A2, . . . , An  are  the  eigenvalues  of  the 
covariance  matrix  T in  descending  order.  It  is  easily  seen  that  the  rows  of  A are 
eigenvectors  of  T.  This  transformation  by  A and  B is  known  as  Karhunen-Loeve 
(K-L)  transform. 

Though  K-L  transform  has  the  optimum  property,  it  presents  some  problems  in 
practical  application  [2,  3].  First,  the  covariance  matrix  is  often  unknown.  Using  an 
estimate  from  the  previous  block  may  lead  to  a loss  in  optimality  if  the  sample  is 
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nonstationary.  Second,  the  covariance  matrix  sometimes  appears  to  be  near  singular, 
so  its  eigenvectors  are  not  uniquely  determined.  Furthermore,  the  computation  is 
difficult  and  there  is  no  fast  algorithm  associated  with  it. 

Discrete  cosine  transform  (DCT),  which  is  a member  of  a large  family  called 
sinusordal  transform,  is  frequently  used  in  practice  [42].  DCT  has  a performance 
close  to  the  K-L  transform  but  is  much  easier  in  implementation. 

The  transform  matrix  A in  DCT  is  defined  by 


2 Ci  (2 j + 1)Z7T 

**  = V^C°S[  2^ 

where  a^-  is  the  entry  in  the  ith  row  and  jth  column  and 

f 1 for  i = 1 

^ ” ( 75  for  i = 2, 3, . . . , n. 

Other  members  in  the  sinusoidal  family  such  as  discrete  Fourier  transform  and 
Hadamard  transform  have  equivalent  performance  as  DCT  [43]. 

Once  A and  B are  defined,  another  problem  is  to  determine  the  bit  allocation 
among  the  components  in  the  block.  Huang  and  Schultheiss  studied  this  problem 
and  obtained  an  optimal  solution  [15].  Assume  b bits  are  available  for  quantizing  the 
whole  block  and  b{  denotes  the  number  of  bits  assigned  to  the  zth  component.  The 
mean  squared  error  can  be  written  as 


ESr=1[(x,  - X,)2]  * 32  K2~»*,  (2.11) 

1=1 

Set  the  derivative  of  the  above  expression  to  zero  subject  to  Y^=lbi  = b.  The  optimal 
bit  allocation  design  is  given  by 
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bi  = 


6 1 of 

b -log, 

n 2 [Det(E)]i 


(2.12) 


where  Det  denotes  the  determinant  of  a square  matrix. 

The  solution  for  (2.12)  sometimes  contains  unrealistic  negative  values.  An  ad- 
justment was  made  by  Segall  to  ensure  a nonnegative  solution  [44]. 

The  constant  K in  (2.11)  is  greater  than  1.  Substituting  (2.12)  into  (2.11),  we  can 
get  a lower  bound  for  the  average  mean  square  error  by  using  Huang  and  Schultheiss’ 
scheme, 


-E^=1[(Xi  - xif]  > 2-“[Det(E)]*. 
n 

Another  approach  to  block  quantization  was  conducted  by  quantizing  the  whole 
block  as  a vector  and  the  quantization  is  performed  within  the  corresponding  vector 
space.  Fischer  and  Dicharry  [45]  suggested  a locally  optimal  vector  quantizer  for 
Gaussian,  gamma  and  Laplacian  sources  and  showed  that  in  low  dimensional  case 
(2-6),  the  improvement  in  MSE  is  significant  compared  to  scalar  quantizer  for  gamma 
and  Laplacian  sources.  Vector  quantization  usually  requires  complicated  computation 
and  some  research  was  done  to  reduce  the  complexity  [46,  47]. 

2.5  Adaptive  Schemes 

The  conventional  DPCM  and  transform  schemes  described  above  are  used  to 
quantize  stationary  signals.  Some  adaptation  needs  to  be  made  when  dealing  with 
nonstationary  signals  in  order  to  keep  the  efficiency  of  the  original  scheme  [48].  In 
each  category  of  quantization  schemes  mentioned  above,  adaptation  can  be  made 
aiming  at  different  adapting  objects. 
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A DPCM  system  consists  of  two  key  components,  namely,  the  predictor  and  the 
quantizer,  both  of  which  can  be  made  adaptive.  Predictor  adaptation  is  done  by 
changing  the  prediction  parameters  in  order  to  fit  the  statistical  behavior  of  the  local 
sample  [19,  20].  In  these  schemes,  the  parameters  are  adjusted  every  period  of  certain 
length  and  then,  the  new  parameters  are  used  in  the  prediction  for  the  next  period 
of  time.  The  quantizer  is  often  adapted  by  altering  the  sizes  of  the  quantization 
intervals  which  will  be  enlarged  or  reduced  by  a certain  measurement  in  accordance 
to  the  variation  of  the  previous  sample  [16,  49]. 

The  adaptive  quantizer  has  attracted  more  attention  than  the  adaptive  predictor 
because  of  its  easier  realization.  Most  of  the  previous  coding  literature  has  been 
limited  to  linear  predictors  with  fixed  parameters,  and  based  on  this  assumption, 
the  emphasis  has  been  put  on  the  improvement  of  the  performance  of  different  types 
of  quantizers.  Cummiskey,  Jayant  and  Flanagan  [48]  and  Jayant  [16]  discussed  a 
uniform  adaptive  quantizer  which  adapts  the  interval  size  according  to  which  slot 
of  the  quantizer  the  previous  sample  falls  in.  The  new  interval  size  is  determined 
by  increasing  or  decreasing  a multiple  of  the  old  interval  size  by  a time-invariant 
function.  Let  A^  be  the  interval  size  for  quantizing  observation  x^,  then, 

X;  = P,y  ± Pi  = 1,3,  ...,2b  — 1 
if  a 6-bits  quantizer  is  used,  and 

= Ai_1M(|P,_1|). 

The  adaptive  multiple  M is  determined  by  the  interval  that  the  previous  quantized 
value  belongs  to.  M < 1 is  assigned  for  small  value  of  |P,_i|,  which  means  Xi_a  falls 
into  an  interval  located  near  the  center  of  its  range.  On  the  other  hand,  if  xv_i  falls 
into  an  interval  near  the  edge  of  its  range,  i.e.,  the  value  of  |P,_i|  is  large,  we  will 
take  M > 1.  The  desire  values  of  M were  found  by  computer  simulation. 
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Note  that  the  adaptation  scheme  is  known  from  the  information  on  the  receiver 
end,  so  no  adaptive  parameter  needs  to  be  passed. 

Another  adaptive  quantizer  was  created  by  Noll  in  which  the  adaptation  is  con- 
trolled by  the  adjusted  estimate  of  the  sample  variance  [50]. 

In  practice,  to  control  the  prediction  error  is  sometimes  more  efficient  and  more 
important.  Because  the  prediction  error  accumulates  during  the  whole  transmission 
process,  any  present  error  will  affect  all  future  predictions.  Thus,  the  handling  of 
the  parameter  variation  might  affect  the  prediction  error  greatly.  Several  predictor 
adaptive  DP  CM  schemes  have  been  created  to  deal  with  this  case. 

Cummiskey  suggested  an  adaptive  predictor  obtained  by  steepest  descent  gra- 
dient search  which  can  be  illustrated  as  follows  [18].  Suppose  xi,  x2»  • • • > are  a 
sample  from  an  AR(p)  process.  The  optimal  predictor  for  X{  is  £?=1  , and  the 

corresponding  squared  prediction  residual  is  e?  = (x,-  — X)f=1  The  gradient 


of  e\  with  respect  to  the  parameters  {< fij } 

is  given  by 

/M\ 

( 2e;X{_1 N 

Grad(e^)  = 

del 

302 

= — 

2e,x, — 2 

V d<t>p  / 

\ 2 eiXi-p  ) 

By  definition,  the  direction  of  the  gradient  is  where  e?  will  attain  maximum 
increment.  So  {<j>j}  should  be  adapted  in  the  direction  opposite  to  the  gradient. 
The  increments,  after  normalization,  become 


A h = 


KeiXi-j 

ZLi  *li 


(2.13) 


where  K is  the  adapting  rate. 
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Cummiskey  also  suggested  that  if  the  bit  rate  is  not  so  small  (b  > 2),  the  trans- 
mission of  the  prediction  coefficients  can  be  avoided  by  determining  these  coefficients 
from  the  reconstructed  signal  on  the  receiver  side,  and  the  cost  for  doing  this  is  rather 
small.  Atal  and  Schroeder  described  a scheme  in  which  the  prediction  parameters  are 
periodically  readjusted  and  suggested  a bit  rate  for  the  transmission  of  the  residuals 
together  with  the  parameters.  But  no  attempt  was  made  to  optimize  the  quantization 
of  the  parameters  [19]. 

In  transform  quantization,  adaptation  can  be  made  either  to  the  transformation 
matrix  or  to  the  bit  allocation  design.  The  adaptive  transform  matrix  can  remove 
the  local  redundancy  and  feed  the  quantizer  with  better  input.  The  adaptive  bit 
allocation  can  improve  the  quantization  efficiency  with  fixed  bit  rate. 

An  adaptive  transform  method,  aiming  at  adapting  the  transform  matrix  A , was 
proposed  by  Tasto  and  Wintz  [49,  51].  Assume  X = (xi,x2, . . . ,xn)'  is  a random 
vector  with 


Ux  = E(X)  and 

Cx  = E[(X  - Ux)(X  - Ux)']. 

The  transform  matrix  A needs  to  be  adjusted  according  to  the  variation  of  the  co- 
variance  matrix  Cx ■ The  adaptation  is  made  by  partitioning  the  sample  space  fi  into 
L disjoint  subsets  {5;}  with  corresponding  probability  {P;}  such  that  the  vectors 
from  the  same  subset  are  expected  to  have  homogeneous  statistical  characteristics 
and  will  be  transformed  by  the  same  matrix  A{.  Now,  the  problem  is  how  to  find  the 
optimal  partition  such  that  the  distortion  is  reduced  to  minimum.  Tasto  and  Wintz 
suggested  that  the  subset  Sk  should  contain  those  X for  which 
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Dfc(X)  = min  Di(X) 

where 

A(X)  = al  + (X-Ui),Cf1(X-Ui) 

Ui  = E(X  |Si) 

Ci  = E[(X-Ui)(X-Ui)/|Si] 

and  cti  is  some  constant. 

The  optimal  choices  of  L and  Pi  were  suggested  according  to  simulation  results. 
Another  adaptive  transform  quantization  scheme  focusing  on  the  adaptation  of 
the  bit  assignment  was  suggested  by  Chen  and  Smith  [14].  In  this  scheme,  the  image 
block  is  classified  into  several  classes  according  to  a measure  of  block  activity.  More 
bits  will  go  to  the  active  blocks  than  the  quiet  ones.  Within  each  block,  the  bits 
are  distributed  according  to  the  covariance  matrix  of  the  transformed  data  by  using 
Huang  and  Schultheiss’  scheme. 

Cuperman  and  Gersho  [52]  proposed  an  adaptive  scheme  in  which  a low-dimensional 
vector  quantizer  is  used  in  an  adaptive  predictive  coding  scheme.  The  current  input 
vector  is  predicted  and  the  residual  vector  is  coded  by  a vector  quantizer. 


CHAPTER  3 


THE  BAYESIAN  ADAPTIVE  DPCM  APPROACH 


3.1  Introduction 


A new  approach,  which  we  called  Bayesian  adaptive  DPCM,  is  attained  in  this 
chapter  to  deal  with  nonstationary  sequences.  We  axe  concerned  with  the  nonstation- 
ary pth  order  autoregressive  processes  with  varying  parameters  which  can  be  modeled 

by 

p 

xt  = faxt-i  + ef.  (3.1) 

»=i 

The  new  scheme,  considering  the  quantization  for  both  parameters  and  the  resid- 
uals, is  developed  using  the  techniques  in  DPCM  and  block  quantization  combined 
with  the  Bayesian  principle.  To  overcome  the  nonstationarity,  we  divide  the  time 
series  into  small  segments  and  sample  is  drawn  from  each  of  them.  If  the  sampling 
frequency  is  high  and  the  segment  is  small,  it  is  reasonable  to  assume  the  sample 
from  this  segment  is  stationary.  The  signal  then  is  decomposed  as  a series  of  sta- 
tionary subsequences  and  each  of  them  is  governed  by  a set  of  constant  parameters. 
The  parameters  are  estimated  inside  each  sample  and  then  used  to  make  prediction. 
The  residuals  and  the  parameters  are  treated  as  a block  in  the  quantization  and 
transmission  processes.  The  parameters,  whose  values  vary  from  sample  to  sample, 
are  treated  as  random  variables.  Motivated  by  Bayesian  analysis,  a prior  density 
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function  based  on  the  previous  experience  is  assigned  to  the  parameters  and  used  to 
determine  the  optimal  quantization. 

Bit  allocation  needs  to  be  considered  in  block  quantization  since  each  component 
in  the  block  makes  different  contribution  to  the  distortion.  In  our  approach,  the 
components  in  the  block  can  be  divided  into  three  groups.  The  first  group  contains 
all  parameters.  The  quantization  of  the  parameters  will  affect  the  performance  of 
the  prediction.  Using  low  level  quantized  parameters  increases  the  variance  of  the 
residuals,  but,  on  the  other  hand,  high  level  in  parameter  quantization  requires  more 
channel  capacity  and  leaves  fewer  bits  available  for  quantization  of  the  residuals.  So 
they  need  special  attention  in  the  quantization  procedure.  The  second  group  contains 
the  first  p observations  in  each  sample.  As  each  block  is  treated  individually,  the  first 
p observations  at  the  beginning  of  the  block  are  not  predictable,  and  they  have  to  be 
quantized  directly.  The  third  group  is  formed  by  all  prediction  residuals.  Since  they 
generally  have  smaller  variance  than  the  components  in  the  second  group,  it  may  not 
be  optimal  to  equally  distribute  the  available  bits  among  them. 

In  Bayesian  analysis,  a loss  function  is  needed  as  the  criterion  of  the  performance 
of  the  procedure.  For  our  problem,  the  total  mean  squared  error  is  naturally  taken 
to  play  this  role.  Our  research  in  this  paper  is  trying  to  obtain  the  optimal  design 
for  bit  allocation  among  the  parameters  {fa,  fa, . . . , fa},  the  first  p observations  and 
the  other  prediction  residuals  to  minimize  the  total  mean  squared  quantization  error 
under  the  assumption  that  each  sample  forms  an  stationary  AR(p)  sequence. 

In  3.2,  the  model  and  quantization  procedure  adopted  in  our  new  scheme  are 
introduced  and  some  related  properties  are  discussed.  In  3.3,  an  approximation 
for  the  variance  of  prediction  residuals  is  derived  under  the  consideration  of  the 
parameter  quantization.  The  limiting  behavior  of  the  residuals  is  closely  examined. 
An  alternative  proof  for  the  optimal  bit  allocation  for  block  quantization  is  given 
in  3.4.  The  above  results  are  applied  in  3.5  and  the  optimal  bit  allocation  for  the 
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Bayesian  DPCM  is  obtained.  Simulation  is  performed  and  the  result  is  compared 
with  other  schemes  in  3.6. 


3.2  Model  and  Quantization 

We  have  mentioned  in  Chapter  1 that  samples  from  different  segments  of  the  signal 
are  treated  individually  and  the  association  between  segments  are  not  considered.  Let 
{xi,x2,  ■ ■ .,in}  be  a sample  drawn  from  an  pth  order  autoregressive  process.  Since 
the  prediction  is  based  only  on  a finite  number  of  observations,  a truncated  version 
of  the  AR(p)  model  is  adopted  in  our  scheme,  which  is  defined  by 

Xi  ~IV(O,0-2),  x — 1,2,. ..,p, 

Xi  =$/Xi_i  + ei,  * = p + l,p  + 2,...,n,  (3.2) 

where  e^’s  are  independent  and  identically  distributed  (i.i.d.)  with  normal  distribution 
iV(0,<7e2).  = (<t>u<f>2,...,<f>P)  and  Xi_i'  = (xi,  x2, . . . , *„). 

Suppose  $ has  known  prior  density  function  7r(<^i,  <f>2, . . . , <f>p ) defined  on  the  ad- 
missible region  by  which  we  mean  a set  of  points  with  coordinates  <f>2, . . . , tf>p) 
which  make  the  model  (3.2)  stationary.  It  is  well  known  that  for  p — 1 and  p = 2, 
the  admissible  regions  are 


«:W<1} 

and 

{(^i)  <t> 2)  : ^2  + <f>i  < 1,  fa  — <f>i  < 1,  and  \<j>2\  < 1} 

respectively.  Furthermore,  we  assume  the  parameter  vector  $ is  independent  of  the 
white  noise  {e^}. 
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The  first  p observations,  xi,  x2, . . . , xp,  are  not  predictable  due  to  the  lack  of 
previous  information.  To  simplify  the  notation,  we  define  the  first  p prediction  errors 
to  be  equal  to  the  observations  and  they  will  be  quantized  directly.  Model  (3.2)  is 
used  to  predict  future  observations  Xi,i  = p + 1 ,p  + 2, . . . ,n,  and  the  residuals  will 
be  quantized  and  transmitted.  By  definition,  the  residuals  can  be  written  as 


e,' 


Xi,  if  i — 1, 2, . . . , p, 

i 

if  i = p+  l,p  + 2,  ...,n, 


(3.3) 


where  $ and  Xi_i  denote  the  quantized  values  of  $ and  Xj_i  respectively.  The 
same  notation  will  be  used  for  other  random  variables  throughout  this  paper  unless 
otherwise  stated.  To  avoid  confusion,  it  should  be  pointed  out  that  Xi  is  obtained 
by  adding  to  its  predicted  value  rather  than  being  quantized  directly.  Therefore, 
Xi  — Xi  is  called  a reconstruction  error  instead  of  a quantization  error. 

The  quantized  values  of  the  prediction  errors  are  transmitted  via  the  channel.  If 
the  channel  is  error-free,  it  is  easy  to  show  that  the  reconstruction  error  Xj  — x;  is 
simply  equal  to  the  quantization  error  e;  — e;.  For  i <P,  this  is  trivial  by  definition. 
For  i > p, 

Xi-Xi-  (xi  - $Xi_x)  + (xi  - $Xi_j.)  = a - e{. 

The  optimality  of  the  L-M  quantizer  has  been  shown  in  the  previous  chapter  and 
is  chosen  to  perform  the  quantization  in  our  new  scheme.  The  quantized  values  and 
the  quantization  intervals  for  the  standard  normal  variable  are  given  by  Max  [5].  For 
the  general  normal  variable,  the  elements  of  the  quantizer  can  be  obtained  by  the 
next  lemma. 
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Lemma  3.2.1 

Assume  X is  a random  variable  with  MSE  quantizer  X . Let  a and  b be  real 

A A 

constants.  Then,  the  MSE  quantizer  for  Y = a + bX  is  given  by  Y = a + bX,  and 
the  mean  square  quantization  error  is  given  by 

E[(Y  - Yf)  = b2E[(X  - X)2]. 


Proof  of  Lemma  3.2.1  : 

Let  Y be  any  quantizer  of  Y,  we  have 

E[(Y  - Y)2]  = b2E[(X  - — j -^)2]  > b2E[(X  - X)2}. 

The  equality  holds  if  and  only  if 


So  we  get 

Y = a + bX. 

This  lemma  allows  us  to  assume  the  variable  x has  zero  mean  and  unit  variance 
without  loss  of  generality. 

We  will  use  the  properties  of  the  L-M  quantizer  stated  in  2.2  in  the  derivation  of 
the  new  scheme.  Specially  property  (2)  will  be  used  repeatedly  in  different  forms.  It 
is  helpful  to  give  all  of  the  presentations  of  this  property. 

Lemma  3.2.2 

Let  X be  a random  variable  and  X the  L-M  quantizer.  Then,  the  following 
statements  are  equivalent. 


33 


1.  E{XX)  = E(X2). 

2.  E[(X  - X)X]  = 0. 

3.  E[{X  - X)X)  = E[{X  - Xf). 

4.  E[(X  - X)2]  = E(X2)  - E(X2). 

The  proof  is  easy  and  so  is  omitted. 

As  mentioned  in  Chapter  2,  the  L-M  scheme  is  created  to  quantize  a single  ob- 
servation or  a set  of  uncorrelated  variables.  We  need  to  consider  the  quantization  of 
the  parameters  which  may  not  necessarily  be  uncorrelated.  The  next  theorem  makes 
some  adjustment  to  the  L-M  quantizer  so  that  it  can  be  used  to  handle  a set  of 
correlated  variables  provided  that  the  joint  distribution  is  known. 

Theorem  3.2.1 

Let  X = (xi,  x2, , xpy  be  a random  vector  with  joint  density  f(x i,x2, . . . , xp ) 
and  X(_i)  = (xi, . . . , Xi_1?  Xi+1, . . . , xp)',  a vector  from  X with  the  ith  component 
omitted.  The  quantizer  minimizing  the  total  mean  square  quantization  error  £,[S?=1(x»— 
Xi)2]  is  obtained  by  applying  L-M  quantizer  on  each  component  X{  with  respect  to  the 
conditional  density  /(x{|X(_i)). 

Proof  of  Theorem  3.2.1 
Define 


p-i 


*E(«i  - iif] 

i=\ 


Then, 
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= l / (xi-xi)2f(xi,x2,...,xp)dxidx2...dxf 

7 — 00  7 — oo 


= Xj/  [/  (*i  - £i)2/(zi|X(_i))dxi  /(X(_i))<£X(_i) 


To  minimize  the  above  expression  is  equivalent  to  minimize 


I xf) 

7 — OO 

for  all  i ’s,  which  can  be  done  by  applying  the  L-M  quantizer  to  each  component  Xi 
with  respect  to  the  conditional  probability.  This  completes  the  proof. 

This  theorem  is  useful  when  the  parameters  in  the  admissible  region  of  the  model 
(3.2)  are  restricted  by  one  another,  i.e.  what  value  a parameter  can  take  depends  on 
the  values  of  the  other  parameters. 

3.3  Limiting  Distribution  of  the  Prediction  Residuals 


Before  quantizing  the  residuals,  we  need  to  examine  their  statistical  character  and 
figure  out  what  factors  have  an  effect  on  it.  Among  different  sources,  the  noise  is  an 
important  one  causing  prediction  error.  Moreover,  since  prediction  is  made  by  the 
quantized  parameters,  the  prior  and  the  quantizer  chosen  for  the  quantization  of  the 
parameters  are  also  closely  related  to  the  accuracy  of  the  prediction.  If  these  factors 
are  all  determined,  the  residuals  {e,}  depend  only  on  the  total  number  of  bits  and 
the  bit  allocation. 

Because  of  the  intersample  correlation,  the  exact  distribution  of  {e^}  is  hard  to 
obtain.  In  case  the  number  of  available  bits  for  the  quantization  of  the  sample  is 
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large  and  the  L-M  quantizer  is  used,  it  can  be  proved  that  {e^}  converges  to  {e^}  in 
distribution. 

Lemma  3.3.1 

Let  X be  a random  variable  with  finite  second  moment  and  with  /(x)  its  probability 
density  function  with  bounded  range.  Let  X be  the  L-M  quantizer  for  X . Then, 
X — *•  X in  mean  square  when  the  quantization  level  v tends  to  infinity. 


Proof  of  Lemma  S.3.1 
By  Lemma  3.2.2, 

E[(X-  X)2]  < E(X2)  < oo. 
Recall  from  the  approximation  (2.7), 


E[(X  - X)2} 


1 

Y2v ^ 


+ o{v  2)  — * 0. 


as  v — ► oo. 


In  3.1,  we  have  mentioned  that  the  three  groups  of  elements  in  a block  consist 
of  the  parameters,  the  first  p observations  and  the  last  n — p prediction  residuals 
respectively.  It  will  be  proved  in  next  section  that  the  optimal  bit  allocation  will  be 
determined  according  to  the  variances  of  the  components  if  they  are  uncorrelated. 
Since  {e;,  i — 1,2,..., p}  have  common  variance  and  so  do  {e;,  i = p+  l,p  + 2, ...,  n}, 
we  will  assign  the  same  numbers  of  bits  to  the  components  inside  each  group.  We 
are  now  going  to  find  the  limit  distribution  of  {e^}  based  on  the  above  bit  allocation. 

Theorem  3.3.1 

Let  {ei}  and  {e;}  be  defined  as  in  the  previous  section.  Let  {v^-},  Vi  and  v2  denote 
the  quantization  levels  of  {fj},  {ei,i  = 1,2,.  ..,p}  and  {e^i  = p + l,p  + 2,  ...,n} 
respectively.  Suppose  that  L-M  scheme  is  used  for  quantization  of  the  parameters 
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and  the  residuals.  Then,  {e*}  converges  to  {e^}  in  mean  square  if  v 4,  Vi  and  tend 
to  infinity. 

Proof  of  Theorem  3.3.1 

For  i < p,  e,  = Ci  by  definition. 

For  i > p, 


B[(«i  - «.)2] 


(3.4) 


Since  (a  + b )2  < 2a2  + 2 b2  for  any  real  numbers  a and  b,  the  expression  (S.f)  is 
less  than  or  equal  to 
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+2  E 


p p 


Y Y xi-jxi-k  ( <t>j  - fa)  ( <f>k  - fa) 

3=1  fc=l 


(3-5) 


Recall  that  Xi  — X{  = e;  — and  the  admissible  region  of  model  (S.2)  is  bounded. 


E 


p p 


'Y.  Y I ( xi-j  xi-j ) (xi-k  fijfik  | 

3=1  fc=l 


P P 


< YY  SUP(I  <f>i<l>k\)E\{ei-j  ~ ei-3)(e<_fc  - ei_jfc)| 

3=1  fc=l 


P P 


< £ £ sup(|<Mfcl)/E[(ei-:;  - ei^j)2][E(ei-k  - e^*)2] 

3=1  Jfc=l 


0, 


as  {i%},t>i  and  v2  tend  to  infinity  by  Lemma  3.3.1  and  Cauchy-Schwarz  inequality. 
Since  x has  normal  distribution,  it  has  finite  fourth  moment.  So 


E 


p p 

Y Y \xi-jxi~k(<t>j  - fa)(4>k  - 

j=l  k=l 


p p 

< E £2sup(W)v/£[(*wZi-02M^  - hr] 


3=1  fc=l 


- 0, 

as  {'U<^J},'Ui  and  u2  tend  to  infinity. 

Theorem  3.3.1  showed  that  the  prediction  residual  tends  to  marginally  in 
mean  square.  We  now  show  that  {et}  are  asymptotically  independent. 


38 


Theorem  3.3.2 

Let  {ei}  be  defined  as  in  Theorem  3.3.1.  Then,  {ei,i  > p}  are  asymptotically 
independent. 

Proof  of  Theorem  3.3.2 

Since  {e^}  have  an  approximately  marginal  normal  distribution  when  the  quanti- 
zation level  is  large,  it  suffices  to  prove  they  are  asymptotically  uncorrelated. 


E | eiej  - titj | 

= E | (e,ej  - eiCj)  + {e{e0  - e^-) | 

< E \ei{ej  - ej)|  + E |(e,-  - ei)ej\ 

< - £>)’l  + 

- 0, 

as  {u^},Vi  and  v2  — > oo.  So 

E{e{ej)  — ► E(ei€j)  = 0 

for  i,j  = p + l,p  + 2, . . . , n.  This  proves  the  asymptotic  independence  of  {et,  z > p}. 

By  Theorem  3.3.1  and  3.3.2,  the  residuals  {ei,i  > p } can  be  approximated  by 
{e;,z  > p}  when  the  number  of  available  bits  is  large.  So  we  can  handle  { e, , z > p}  as 
a set  of  i.i.d.  random  variables  which  provides  us  with  great  convenience  in  further 


39 


approach.  But  it  is  obvious  that,  at  small  quantization  levels,  the  common  variance  of 
{ei,i  > p},  say  cr*,  is  greater  than  that  of  {e;,i  > p}.  The  increment  in  the  variance 
is  caused  by  the  quantization  errors  of  the  parameters  as  well  as  the  quantization 
errors  of  the  previous  residuals.  It  is  desirable  to  examine  the  influence  of  these  two 
factors  on  the  prediction  error  in  order  to  find  the  optimal  bit  allocation. 

By  definition,  the  prediction  error  {ej,i  > p}  can  be  written  as 


e»  — X{  $ Xj_  i 

= Xt  - $%_!  + $%_!  - $'Xi_!  + S'Xj.i  - 

= et  + ($-$yxi_1  + $'(xi_1-xi_1). 


Then, 


= E{e]) 

= ^ + E[($-$yxi_1x|_1($-4)] 

+E  [$,(Xi_1  - Xi.OCXj.i  - Xi_i)'*] 


+2 E et($  - $)'Xi_i  + 2 E ks'CXi.! 


+2 E [(*  - 4)'Xi_1(Xi_1  - Xi_xy$  . 

The  fourth  and  fifth  term  are  zero  since  is  independent  of  $ and  Xj_!.  The 
last  term  is  approximately  zero  by  the  results  given  by  the  next  two  theorems. 
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Theorem  3.S.S 

Let  X\,X2,  ■ . ■ ,XP  be  the  first  p unpredictable  variables  from  model  (3.2.1).  Then, 

E[(Xi-Xi)Xj}  = 0,  i,j  = l,2,...,p. 


Proof  of  Theorem  3.3.3 

If  i = j,  it  follows  the  property  (2)  of  the  L-M  quantizer  directly. 

Assume  i / j.  Without  loss  of  generality,  let  i > j , Considering  the  conditional 
expectation  on  Xi  given  Xk,  Jc  = i — l,i  — 2,  ...,1  under  model  (3.2),  we  have 

LI [Xi  |-^t— 1 = 3'i—l)  X{— 2 = 2)  • • • > X\  — Xj] 

= -E[Xi\Xi-i  = = -xi],  (3.6) 

By  Lemma  (3.2.1),  we  have  (—X)  = —X.  So  the  conditional  expectation  of  the 
quantization  error  is  an  odd  function  of  {xjt,  k = i — 1,  i — 2, . . . , 1},  i.e., 

E[{Xi  — = Xi-i,X{- 2 = X{- 2, . . . , Xj  = xfi 

= —E[(Xi  — Xi)\Xi-i  = —Xi-i,Xi-2  = —Xi-2,...,Xj  = —Xj].  (3.7) 

Since  (Xi-i,  Xi-2, . . . , Xi)1  is  a multivariate  normal  random  vector  with  mean 
zero, 


E[(Xi-Xi)\Xi-1,Xi-2,...,X1]  = 0. 
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So 


E[(Xi  - Xt)X,} 


= E{XjE[{Xi~X i)\Xi-uXi.2,...,X1]} 

= 0 


forij  = 1,2 
Theorem  3.3.1 

Xi_x  and  X;_x  are  defined  as  above.  Then, 


E 


0, 


as  {u^},Vi  and  v tend  to  infinity. 


(3.8) 


Proof  of  Theorem  3. 3. A 

The  element  in  the  jth  row  and  kth  column  of  the  matrix  Xj_x  (Xi_x  - Xu)'  is 

Xi„j(xi-k  — Xi_k).  We  now  prove  that  the  mean  of  each  element  for  fixed  parameter 
$ tends  to  zero. 

Ifbothi—j  andi—Jc  are  less  thanp,  the  result  follows  by  Theorem  3.3.3.  Therefore, 
we  need  only  consider  the  case  that  at  least  one  of  i — j and  i — k is  greater  than  or 
equal  to  p. 

If  j = k,  note  that  xt.j  - Xi_j  = e^j  - e^j  and  x,_j  = $'Xi_j_x  + ei-j.  Then, 


= E [($Xi_j_x  + ei-jXa-j  - ei_j)|$] 
= E [$Xi_j_x(ei_i  - ei_y)|$ 


+ E [e,_J(e,_J  - e,_j)(^] . 
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By  the  properties  of  the  L-M  quantizer, 


E{(e,.,  - «-;)!*]  = 0 

£[e,-,(e,_,  - = 0, 

Since  ei-j  is  asymptotically  independent  o/Xj_j_i  and  the  conditional  mean 
vanishes  in  this  case. 

If  j > k, 


E[xi-j(xi-k  - a<-fc)|$] 


— [®»— j(^i- k &i— fc)|^] 


0 

as  and  t>2  — ► oo  by  the  asymptotic  independence  of  Xi_j  and  e^_fc. 


If  j < k, 


E[xi-j(xi-k  ~ 

= E [($Xi_j_!  + ei-j)(ei- k ~ e<_fc)|$ 


= E 


i—k 


53 /»(*)*!+  S 9i(^h 

i=i  i=P+ 1 


0 
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as  and  u2  — ► oo,  where  fi($)  and  gt($)  are  some  functions  obtained  by 

recursively  expressing  X{  by  the  previous  observations  using  model  (3.2). 

This  completes  the  proof. 

After  eliminating  the  cross-product  term,  of  may  be  written  as 

°i  = + s[(»-  - *)] 


+E  - XnXXi.1  - x,_0'# 


The  above  expression  still  contains  X;_j  whose  distribution  is  hard  to  obtain.  We 
have  the  following  asymptotic  result  replacing  Xi_i  by  Xj_i  and  (X;_x  — X;_x). 

Lemma  3.3.2 

Under  the  assumptions  in  model  (3.2),  we  have 
E [($  - *)/Xi_1X|_1(*  - $)' 


= e [($  - fcyxnXi-x't*  - 4) 


-E  [(*  - $)'(Xi_1  - Xi_1)(Xi_1  - Xi.x)^  - *)]  . 


Proof  of  Lemma  3.3.2 


e [(«•  - iiVA-i'fi  - #)] 
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= E 


($  - $)'  (Xj_!  - Xi_x  + Xj_x)  (Xi_!  - Xi_1  + X|.i)>  - *) 


= £ 


(*  - *)'  (Xi_,  - x,_,)  (x,.,  - X;_i)'($  - 4) 


+£[(*-#)'Xi.1X!.1(#-#)] 


+2 E [(*  - 4)'  (XH  - Xu)  Xj_1($  - 4)] 


The  last  term,  vanishes  by  Theorem  3.3.4  and  the  result  follows. 


By  Lemma  3.3.2,,  we  have 


c2,  = c]  + £[(#-  4)'Xi_1X, 4)] 


+E  #'(Xi_1-Xi_,)(Xi.1-Xi_,)'$ 


-E  [(#  - 4)'(Xi_,  - X,.,)(XM  - X, .,)’(*  - 4)]  . (3.9) 

From  (3.9),  we  can  see  the  influence  to  the  mean  squared  prediction  error  from 
different  sources.  The  first  term  is  the  effect  from  the  quantization  errors  of  the 
parameters  and  the  second  term  is  that  from  the  previous  residuals.  The  last  term 
is  the  interaction  factor. 

Using  Lemma  3.3.2  again,  the  last  two  terms  can  be  merged  into  one. 


£[#'(X1_1-Xi_1)(Xi_l-Xi_,)'$] 
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-E  [(*  - *)'(Xi_,  - Xi_,)(X1.1  - 3t,_x)'(*  - *)] 


= E 


4#(xi.1-4i_1)(xi_1-ii  _i)'4> 


So 


cl  = cl  + E [(*  - - #)] 


+E  f®'(Xj_x  - XuXXn  - Xu)'* 


(3.10) 


Equation  (3.10)  will  be  used  to  determine  the  optimal  bit  allocation. 


3.4  Minimum  MSE  Bit  Allocation 


In  the  L-M  scheme,  the  quantization  error  is  a function  of  the  quantization  level 
if  the  input  of  the  quantizer  has  fixed  distribution.  The  approximation  given  by 
(2.6)  reveals  the  association  between  the  mean  squared  quantization  error  and  the 
quantization  levels. 

Consider  a set  of  uncorrelated  random  variables  with  unequal  variances  are  to  be 
quantized  as  a whole.  Assuming  that  {xt,  z = 1,2,...  ,n}  are  independent  normal 
random  variables  with  variances  {<r?,z  = 1,2,  ...,n}  respectively,  we  are  going  to 
find  the  best  choice,  say  b{,  of  the  number  of  bits  assigned  to 

Huang  and  Schultheiss  [15]  studied  this  problem  and  obtained  an  optimal  scheme 
for  distributing  bits  among  independent  variables  with  unequal  variances.  Formula 
(2.12)  given  by  Huang  and  Schultheiss  can  be  used  to  approximate  the  optimal  num- 
ber of  bits  assigned  to  each  variable. 


Huang  and  Schultheiss’  result  can  be  obtained  by  a different  approach  which 
reveals  some  properties  of  the  optimal  bit  allocation. 

Lemma  S.f. 1 

Let  X\  and  X2  be  real  variables,  and  a\,a2,p  and  c be  positive  real  constants.  Then, 
subject  to  x\x2  = c, 


/(*  1**2)  = — + — 

Xi  12 


is  minimized  if  and  only  if 


x2  _ pa2 

Sl  Oi 


Proof  of  Lemma  3.1.1 

From  the  restriction,  we  have 


and 


dx2  _ px2 

dx-i  si 


Hence, 


_d_ 

dxi 


«i  + Pa 2 
Xj  S1S2’ 


The  first  order  partial  derivative  equals  zero  if  and  only  if 
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It  is  easy  to  check  that  the  second  order  partial  derivative  is  positive,  so  the  extreme 
value  we  found  is  indeed  the  minimum. 

Theorem  3.1.1 

Let  X{  ~ N(0,  of),  i = 1, 2, . . . , n,  be  the  input  for  the  L-M  quantizer  with  total  of 
b bits  available.  Then  when  b is  large,  the  asymptotic  minimum  MSE  bit  allocation 
will  assign  \ bits  to  X,  such  that 


where  = 2b'  is  the  quantization  level  for  X< . 

Proof  of  Theorem  3.1.1 

Since  ]C"=i  h = b,  we  have  n"=1u,-  = 2^=1 6<  = 2b.  Recall  the  approximation 
formula  for  the  quantization  error  (2.6)  as  well  as  the  result  given  by  Lemma  3.2.1. 


E[{Xi  - Xt)2]  ~ 


> 


where  f(x)  = " is  the  standard  normal  density  function. 

The  constant  part  contributed  nothing  in  the  procedure  of  minimization,  so  it  can 

a2 

be  removed  from  the  expression  of  the  total  MSE.  Now  it  suffices  to  minimize  2Z”=i 
subject  to  IIJLjVi  = Constant. 

In  the  case  ofn  = 2:  Write  ai  = crj  and  x;  = vf  for  i = 1,2,  then  the  problem  is 
the  immediate  result  of  Lemma  3.4-1- 

Assume  the  result  holds  for  n — 1.  By  induction  we  now  show  it  also  holds  for  n. 


Note  that 
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n— 1 2 


ES  = (--iA 


vl 


and 


n"=ivi  = vi  -4«i 


iit=2  n— 1. 


■^TV 1 Un  = “a(w2_V)  V1  lyn  = Ctf  V, 

°i  / o,  ' 


say,  by  our  assumption.  The  problem  is  reduced  to  minimize 


(n  ~ 

t* 

subject  to  v”~lvn  = 2 */c. 

Setting  a-i  = (n  - l)of,  a2  = <rf,  X{  = vf,  fori  = 1,2,  and  p = n - 1,  the  result 
follows  by  using  Lemma  3-4.1  again. 


Theorem  3-4-2 

Suppose  the  total  number  of  the  available  bits  is  fixed.  Then,  Huang  and  Schultheiss’ 
formula  (2.12)  gives  approximately  the  optimal  bit  assignment  among  independent 
normal  random  variables  X\,  X2, . . . , Xn  in  the  sense  of  attaining  the  minimum  MSE 
with  the  L-M  quantizer. 

Proof  of  Theorem  3-4-2 

By  Huang  and  Schultheiss’  formula  (2.12),  the  quantization  levels  for  Xi  is 


= 2’ 


(3.11) 
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Hence, 


(3.12) 


The  result  follows  Theorem  3.4-1. 

Suppose  that  we  have  totally  s bits  available  for  quantizing  each  block  of  the 
signal  from  an  AR(p)  process,  and  m bits  will  be  assigned  to  the  parameters  3>. 
Let  rrij  be  the  number  of  bits  used  to  quantize  cfj,  so  the  quantization  level  for  <f>j  is 
V4>.  - 2m>.  For  the  b bits  left  after  assigning  m bits  to  the  parameters,  let  bx  and  b2  be 
the  numbers  of  bits  assigned  to  the  first  p observations  and  the  last  n — p prediction 
residuals  respectively.  Then,  by  Huang  and  Schultheiss’  formula,  6i  and  b2  can  be 
calculated  by 


h 


i 1 , -E(  7oK2 

n 2 gl  lE^y^iy^y-r ]i 


- + ilog2(B(7o)]!i?: 

Tt  2j 


b 1 o 

n 2 [E(l0)H^)p(^rp\- 


- + ilog2(E(7„)r" 

n 2 


(3.13) 


where  70  = er \fa\  is  a function  of  3>,  and  the  corresponding  quantization  levels  are 


V\  — 2bl  =2”  [FJ(7o)]  2n 

u2  = 2^  = 2^[E{lo)}~^  (3.14) 


respectively. 
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It  should  be  pointed  out  that  the  above  formula  will  overassign  bits  to  {ei,z  = 
1,2, ... ,p},  since  of  is  used  instead  of  of.  This  does  not  cause  much  difference  when 
n is  large.  If  n is  small,  it  should  be  adjusted  by  using  an  iterative  method.  Use  of  as 
the  initial  value  to  calculate  bi  and  b2  from  (3.13),  then,  use  b\  and  b2  to  obtain  the 
approximation  of  of  derived  in  the  next  section.  Repeat  this  procedure  recursively 
until  the  new  values  of  {bi,i  = 1,2}  are  close  enough  to  the  previous  ones. 

Note  that  b = s — m and  from  (3.14),  we  have 


- = [£(7o)]--  (3.15) 

V2 

So  after  are  chosen,  V\  and  v2  are  determined  by  (3.15).  If  the 

prior  of  $ and  the  variance  of  e,  are  known,  of  is  a function  depending  only  on 
{v<t>i  > v<h  » • • • > V4>P}‘ 


3.5  Optimal  Bit  Allocation  among  Parameters 

In  the  previous  section,  the  minimum  MSE  bit  allocation  for  quantizing  a set  of 
uncorrelated  random  variables  was  discussed  and  the  procedure  was  used  to  determine 
the  bit  distribution  among  the  prediction  residuals.  But  the  same  procedure  cannot 
be  simply  applied  to  determine  the  bit  distribution  among  the  parameters  since  we 
are  facing  a different  situation.  The  purpose  of  the  parameter  quantization  is  to 
get  a minimum  MSE  reconstruction  for  the  original  signal  but  not  the  parameters 
themselves.  The  parameters  have  different  contribution  to  the  reconstruction  error 
and  the  bit  allocation  should  be  decided  for  this  purpose. 

Since  the  prior  density  of  (<^i,  cf> 2, . . . , <^p}  is  defined  on  the  admissible  region  of 
the  AR(p)  model,  the  domain  of  one  parameter  often  depends  on  the  values  taken  by 
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the  others.  So  the  parameters  are  generally  not  uncorrelated.  In  this  case,  Theorem 
3.2.1  will  be  applied  to  get  the  optimal  quantization. 

We  are  now  going  to  find  the  functional  form  of  i£[(e,-  — e^)2].  expressed  as  a 
function  of  quantization  levels.  To  do  so,  we  need  the  next  two  lemmas. 

Lemma  3.5.1 

Let  x be  a variable  with  density  function  f(x).  x is  the  L-M  quantizer  with 
quantization  level  v.  Then,  we  have  the  next  approximation  based  on  large  value  of 
v. 

/( x — x)2g(x)f(x)dx 

-OO 

- fHx)dx\  j_j{x)fhx)dx  (3.16) 

where  g(x)  is  a continuous  real  function. 

Proof  of  Lemma  3.5.1 

Let  — oo  = co<ci<...<cv  = oo  be  the  end  points  of  the  quantization  intervals 
and  qt  be  the  quantized  value  in  (cj_i,Cj].  Define  A,  = q — c<_i.  Note  that  the  integral 
I- °oo9(x)dx  can  be  approximated  by  g(qi)Ai  if  v is  large.  So 

[ (x-  x)2g(x)f(x)dx 

J — OO 


= Hi  I (x~  V)29(x)f(x)da 

i= 1 Jci- 1 


(3.17) 


Note  that  g(x)  ~ g(qi),  for  x £ A,,  which  may  be  moved  outside  the  integral.  In 
the  derivation  of  the  mean  square  quantization  error  in  Section  2.2,  we  know  that 


[ f*(x)dx=  C,  fori  = l,2,...,u, 

JCi-l 
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where  C is  some  constant.  We  have 


IX?.)/  - qif  f(x)dx 

i= 1 Jci~l 


12v3 


f fHx)dx  ib  / fHx)dx  9(<n ) 

.J-oo  L«/— oo 


12u3 


[ f*(x)dx  [v/i  (gi)Ail 

•'/-°°  i=i 


12u2 


[ f*ix)dx 

U-oo  J i=1 


12u2 


■ roo  , 1 2 /'OO  1 

/ /3(x)cta  / g(x)f»(x)da 

.J —OO  J J —oo 


In  a special  case  when  g(x)  = 1,  i/ie  above  expression  becomes 


i r z*00  i i3 

12v*  [I-JJ(x)dx\ 


which  gives  the  approximation  for  the  mean  square  error. 

Lemma  3.5.2 

Let  x be  a variable  with  density  function  f(x).  x is  the  L-M  quantizer  with 
quantization  level  v.  g[x)  is  a continuous  real  function  and  g'(x)  is  bounded.  Then, 


f (z  - x)g(x)f(x)dx  = 0(v  2), 

J — OO 


(3.18) 


where  0(vn ) means  at  most  the  same  order  of  magnitude  as  vn . 
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Proof  of  Lemma  8.5.2 

Let  {ci}  and  {<fr}  be  defined  as  in  Lemma  3.5.1.  By  a Taylor  expansion,  g(x ) can 
be  expressed,  in  each  quantization  interval,  as 

g(x)  = g(q,)  + g'(qi)(x  -qi)  + o ((x  - qf))  . 

By  eliminating  the  higher  order  infinitesimal, 

/ ( x - x)g(x)f(x)dx 

J — OO 


- / (X  ~ <li)f(x)dx  + 12  ff'(li)  / (X  - qi)2f(x)dx 

t=l  Jci-1  i= 1 Jci~l 

/OO 

(x  — x)2  f(x)dx 

-OO 

Note  that  the  first  term  in  the  next  to  last  expression  is  zero  and  the  result  follows 
from  (2.7). 


Theorem  3.5.1 

If  the  quantization  levels  are  large,  we  have  the  next  approximation 


E 


[(*  - ij'Xi-iX, _!'(<!>  - *)]  =!  <rt2  £ <fi, 

j=i  <h 


where 


G>  - s jC  LC  2 LC 


^j\jc  • • • i $j— l)  ^i+i > • • • > ^p) 


7TjC  ^(^l)  • • • ) fj- 1)  <C+1>  • • • > <^p)- 
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Proof  of  Theorem  3.5.1 


e [(*  - i/XuXu't*  - 4)] 


= b{e[(*-  iyxnx,.,'**  - 4)|*] } 


= e [(#  - 4)'iv,2(*  - 4)] 


where 


2 2 - i})(<l>k  - 4>k)  , 

j= 1 k= 1 


1 7o 

7i 

72  • 

• • 7p-i  \ 

7i 

7o 

7i 

• • 7p-2 

72 

7i 

7o 

• • 7p-3 

\7p-i 

7p-2 

7p-3 

••  70/ 

(3.19) 


is  suc/i  that  E[(Xj_1Xi_1,)|$]  = Taj. 

Let  — oo  = Cjo  < Cji  < ...  < Cjv<>,  — oo  be  the  end  points  of  the  quantization 
intervals  of  <f>j,  Aji  = Cji  — Cjj-1,1  = 1,2,...,  and  qji  be  the  quantized  value 
in  A ji.  We  now  prove  that  the  terms  in  (S.19)  with  index  j ^ k are  higher  order 
infinitesimal  than  that  with  j — k. 

If  3 * k, 


E 7 3~k{<t>j  ~ 4>j)(<t>k  ~ <j>kj\ 
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= e {e  [7 - 4>j)(4>k  - #*)|^(-j)] } 


= E {(</>*  - <i>k)E  7 ~ } 


(3.20) 


Note  that  E 


~ ^j)l^(-j)  — 0{v^).  Using  Lemma  (3.5.2)  again,  we  have 


E 


7 ~ <f>j){<f>k  - h)\  = 0(v^v^)  = o(V) 


for  l = j,k.  So  we  can  consider  only  the  terms  with  j = k. 


E 


7o  (<f>j 


= E{E  [70  {(f>j  - j>j)2\(f>u ...,  <f>j- 1,  <j>J+ 1, . • • , <t>P\ } 


/oo  ^ r rco  1 1 2 r [OO  1 


(-j) 


So  by  eliminating  the  higher  order  infinitesimal  , 


E [(#  - #)'Xl.1Xi.1'(*  - i) 


P /nr 

- E rr- 


j=i  % 
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Theorem  S.5.2 


E 


*,(Xi_1-Xi_1)(Xi_1 


- E 

3=1 


m - £ 
%■ 


id 


V . 


2 > 


where 


and  f(x)  is  the  standard  normal  density  function. 


Proof  of  Theorem  3.5.2 


e [i'(Xn  - *i-,)(Xi-i  - Xi_,)'4 


= E 


P V 


E!  El (xi-i  xi-j)(xi-k  Ei-k'jfj&k 

j= 1 fc=l 


= E 


p p 


Xy(e,_J  ^x-j)(ei-k  &i-k)4’j4>k 
j= 1 fc=l 


Since  all  ei’s  have  same  quantization  error  by  Theorem  3-4.2,  we  can  ignore 
whether  i < p or  i > p. 
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If  j ^ k,  then, 


&i—j  €i— fc 


~ 0 


since  e{ ’s  are  asymptotically  independent  of  <f>i ’s  and  independent  of  one  another. 
If  j = k,  then, 


~ E [(.i-y  - E $]  , 

because  of  the  independence  of  e, ’s  and  fi’s.  By  applying  Lemma  (3.2.2),  we  get 


and 


E\62A  = E 


#?]  - E [(&  - i,f] 


= E\4>]\-  j If  (i,  - 

J — OO  \-J  — oo 


~ E 


^'1  “ /-oo  12 v$.  [/-oo  ri\icd*\ 


*j‘d&(- j) 


= 

<pj 


EHei-J  ~ e^f]  ~ § 


Lemma  3.5.1.  So 
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E 


*'(Xi_i  - Xi_1)(Xi_1  - Xi-i)'* 


- E 

i=i 


* W)  - 1 

vi- 


ler2. 


vz 


Combining  Theorem  3.5.1  and  3.5.2,  the  expression  of  a\  given  in  (3.10)  can  be 
written  as 


j=i  u<t>j  j= i 


E U 


Di 


(3.21) 


The  above  equation  implies  that  <r^  is  a multiple  of  cr\  for  fixed  quantization  levels. 
We  can  write 


<re  , Vfa , . . . , u^>p,  Uj , U2)(Je , 


where 


(3.22) 


i + sjU  Gjvj; 


h2 


1 - Ej.,  \E(4j)  - DjVff]  IV; 

By  Lemma  3.2.1,  the  mean  squared  quantization  error  has  the  expression 


(3.23) 


*k«-«)»]~^  1 + *W 


v?/-1  - eu  \m)  - 


(3.24) 
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We  can  now  write  the  total  mean  squared  error  that  we  want  to  minimize  as 


sEfe-y1] 

*=i 


= +£*[(«. -*,)■] 
t=l  i=p 

„ i + yu  g,y 

n°'  vll'1  - T.U  [«(#)  - AV 

= no\H  (3.25) 

say.  The  approximation  above  is  based  on  a direct  result  from  Theorem  3.4.1  that 
the  quantization  errors  for  all  the  components  in  a vector  will  be  same  if  Huang- 
Schultheiss  bit  allocation  scheme  is  used. 

Since  we  are  interested  in  the  average  values  of  the  quantization  levels,  they  will 
be  treated  as  continuous  variables  although  it  is  not  the  case  in  practice.  So  the  best 
choice  of  {y^j , j = 1,2,...,  p}  can  be  determined  by  differentiating  the  expression 
(3.25)  with  respect  to  and  setting  the  derivative  to  zero.  Note  that 


v*i 


= 2m> 


V2 


s 


rrij 


i= i 


2 


n 


[£(  7o)r* 


p 

= v4," 2 “ [£(70)]  2n  • 


so 


<9^2  _ V2 
dvlj~  UVl' 

We  have 


dH  Uj  + Vj 
dv\.  ~ W2  ’ 


where 


W 


- £ [*W?) 

V=1 


Vi 


The  optimal  v^i ’s  are  obtained  by  solving  the  equations 


g,  (i  + e;=1  G>;»)  / A G, 

< - E?-,  [B(^)  - D,-vj\ } { k *1, 
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= 0, 


for  j = 1,2, . . . ,p  simultaneously  subject  to  = [j5(7o)]2  and  m + b = s. 

In  3.4,  we  have  mentioned  that  when  the  number  of  available  bits  is  small,  the 
value  <j\  can  be  used  to  adjust  the  bit  allocation  between  V\  and  u2.  Conversely, 
we  can  calculate  a\  again  from  the  adjusted  value  of  v2.  It  can  be  proved  that  we 
can  get  the  optimal  choice  for  v2  by  repeating  this  procedure.  For  fixed  { v let 

be  the  initial  value  for  v2  obtained  from  (3.10)  using  Note  that  for  fixed 
{u^.},  is  the  smallest  quantization  level  we  could  possibly  assign  to  the  residuals 
since  <j\  > cr\.  Let  be  the  value  calculated  from  (3.22)  using  the  initial 
Note  that  the  expression  of  a\  in  (3.22)  is  a decreasing  function  of  v2.  So  is 
the  largest  possible  value  for  if  Huang-Schultheiss  bit  allocation  scheme  is  used. 
Using  (3.10)  again,  v ^ obtained  from  cr^1)  will  attains  the  maximum  for  v2  and  will 
lead  to  the  smallest  value  a ^ using  the  calculated  But  note  that  > cr \ 

unless  the  quantization  levels  { v ^ . } and  u2  tend  to  infinity.  Repeating  this  procedure 
recursively,  we  have 


v 

v 


(1) 
2 

(2) 
2 


< 

> 


43)<...<i42n+1)<... 


t44)  > . . . < v2  > (2)  > . . . 


So  v ^ will  converges  and  its  limit  will  be  the  optimal  choice  for  u2. 


3.6  Simulation  and  Discussion 


A computer  simulation  was  run  to  evaluate  the  performance  of  the  Bayesian 
adaptive  DPCM  scheme  discussed  above.  The  data  used  in  the  simulation  were  from 
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an  AR(1)  model.  Since  there  is  only  one  parameter  in  this  model,  we  will  omit  the 
index  in  the  following  discussion. 

First,  we  evaluated  the  goodness  of  the  asymptotic  approximation  at  low  quan- 
tization levels.  In  the  simulation,  the  parameter  <f>  was  generated  from  a uniform 
distribution  (Table  3.1)  and  a distribution  with  linear  density  function  (Table  3.2). 
Both  distributions  were  defined  on  the  interval  [0, 1 — £]  with  6 = 10-4.  Processes  of 
length  n = 20,40,60,80,100  with  normally  distributed  white  noise  with  zero  mean 
and  unit  variance  were  generated  using  the  parameter  <f>.  The  parameter  and  the 
prediction  residuals  were  then  quantized  using  the  Bayesian  adaptive  DPCM  scheme 
with  s/n , the  average  bits  for  each  observation,  equals  to  3,  4,  and  5,  respectively. 
The  theoretical  mean  square  quantization  error  E[(ei  — e;)2]  was  calculated  by  using 
the  approximation  formula  (3.25)  and  the  estimate  E[{e{  — e^)2]  was  obtained  by  the 
average  value  of  1000  simulated  series.  Since  in  practice  the  bit  number  and  the 
quantization  level  take  only  integer  values,  the  optimal  values  are  rounded  up  to  the 
nearest  integer.  The  result  is  listed  in  Table  3.1  (for  uniform  prior)  and  Table  3.2  ( 
for  linear  density  prior). 

From  Table  3.1  and  Table  3.2,  we  can  see  that  the  MSE  from  the  simulation 
results  are  very  close  to  that  obtained  by  the  asymptotic  approximation  for  m > 2, 
and  the  difference  turns  smaller  when  the  total  number  of  available  bits  increases. 
The  optimal  bit  assignments  from  the  simulation  mostly  coincide  with  that  from  the 
theoretical  approximation  with  the  forthest  departure  of  one  bit.  Since  the  mean 
squared  quantization  error  changes  very  little  in  the  neighborhood  of  the  optimal  v^, 
this  slight  departure  will  not  make  big  difference  in  the  general  performance  of  this 
scheme. 

We  are  also  interested  in  examining  how  the  mean  squared  quantization  error  is 
affected  by  the  variation  of  the  other  factors.  For  discussion  convenience,  we  rewrite 
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the  expressions  (3.23)  and  (3.24)  subject  to  AR(1)  model  as  follows 


Ciy^v  i,v2) 


vl  + G 


(3.26) 


_ . X21  ivi  + G)ae 

e'  ,lvlI-'-E(P)vl  + D- 

Among  several  factors  which  might  affect  the  optimal  choice  of  v#,  the  sample 
size  n has  the  most  significant  influence.  Table  3.3  gives  us  the  optimal  choices  of  m 
against  the  series  length  n with  average  bit  rates  3,  4,  and  8.  The  optimal  values  of 
are  also  listed  since  it  has  smaller  rounding  error.  We  can  see  that  the  optimal 
v#  is  an  increasing  function  of  n,  which  means  we  should  assign  more  bits  for  the 
quantization  of  the  parameter  if  the  sample  size  becomes  larger.  This  conclusion 
seems  quite  reasonable. 

It  seems  surprising  that  the  average  bit  rate,  when  it  is  not  so  small  (>  3)  , has 
little  effect  on  the  optimal  choice  of  v#.  An  explanation  can  be  obtained  from  (3.27). 
The  constants  D and  G in  the  expression  are  generally  small  compared  with  the 
values  of  the  quantization  levels  v#,  v-i  and  v2.  Note  that  D and  G depend  only  on 
the  prior  density  assigned  to  the  parameters  and  in  our  simulation,  D = 0.0833  and 
G = 0.4126  for  the  uniform  prior  and  D = 0.0703  and  G = 0.4323  for  the  linear 
prior.  So  when  the  quantization  levels  are  large,  these  constants  can  be  eliminated 
from  the  expression  (3.27).  So  we  have 


E[(et  - e;)2]  ~ 


Iff2. 


vi-iE{py 


(3.28) 


From  (3.28),  it  is  clear  that  the  quantization  error  depends  on  the  parameter  quan- 
tization only  through  v2  which  is  fairly  stable  under  the  change  of  v^,. 
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The  variance  of  the  white  noise  a 2 has  no  effect  on  the  optimal  bit  allocation. 
Equation  (3.27)  shows  that  the  mean  square  quantization  error  is  indeed  proportional 
to  a2.  But  in  (3.26),  C(y^  va,  u2),  from  which  the  optimal  value  of  v $ is  determined, 
is  independent  of  of.  This  gives  us  an  explanation  why  the  optimal  design  holds  for 
different  values  of  of. 

The  value  E(<f>2)  is  usually  small  because  of  the  restriction  of  admissibility.  For 
instance,  the  admissible  regions  for  AR(1)  and  AR(2)  models  are  {<f>  : \<j>\  < 1}  and 
{(^1,^2)  : <£2  + <j>i  < 1,  <j>2  ~ <t>\  < l,and  \<f>2\  < 1},  respectively.  We  have  E(<f>2)  < 1 
for  the  AR(1)  model  and  E(<j>\)  < 4 and  E(<j> l)  < 1 for  the  AR(2)  model.  The 
upper  bounds  are  very  unlikely  to  be  attained,  and  for  most  practically  used  prior 
distributions,  the  variance  is  fax  lower  than  the  upper  bound.  In  contrast,  the  square 
of  the  quantization  level  is  usually  a fairly  large  number.  Hence,  the  quantization 
error  can  be  approximated  by 


£[(«  - «.)’l  * '-4,  (3,29) 

V2 

which  is  the  limit  of  the  quantization  error  for  the  white  noise. 

Comparison  was  made  with  the  conventional  DPCM  and  Cummiskey’s  adaptive 
scheme.  Two  types  of  nonstationary  time  series  were  used  in  the  simulation.  The 
first  type  of  AR(1)  series  was  composed  by  several  subsequences  with  different  lengths 
which  was  decided  by  a random  variable  from  a Poisson  distribution  with  mean  A. 
The  parameter  <f>  was  from  a uniform  distribution.  <j>  t/[0,l-£]  with  5 = 10"4.  The 

second  type  of  nonstationary  series  was  generated  by  using  a time- variate  parameter. 
The  initial  ^ had  a uniform  distribution  on  the  interval  [0, 1-6]  and  a small  noise 
A<f>  ~ U(— 0.1, 0.1)  was  added  to  it  after  each  observation.  <f>  + A<f>  is  truncated  by  the 
upper  and  lower  bounds  0 and  1 — 8 if  it  is  out  of  range.  In  the  simulation,  both  types 
of  series  with  length  n = 1200  was  divided  into  certain  number  of  blocks  with  equal 


65 


size  l and  each  block  was  treated  individually.  In  DPCM,  the  prediction  was  made 
using  a constant  4>  = E(</>)  throughout  the  whole  block  without  any  adjustment, 
so  no  bit  was  needed  to  quantize  the  parameter.  In  the  Bayesian  approach,  <f>  was 
estimated,  and  then,  prediction  and  quantization  were  performed  within  the  block.  In 
Cummiskey’s  scheme,  the  first  10  observations  were  used  to  get  the  initial  estimation 
for  the  parameter  <f> . Each  estimate  was  used  only  once  for  the  prediction  for  the 
next  value  and  after  that,  the  estimate  was  updated  based  on  the  new  observation 
according  to 

A<f>=— . (3.30) 

Xi- 1 

The  best  adaptive  rate  K was  found  to  be  0.1  by  comparing  different  rates.  The  first 
10  data  were  quantized  directly  while  others  were  predicted  and  the  residuals  were 
quantized.  The  mean  squared  quantization  errors  using  different  schemes  are  listed 
in  the  columns  in  Table  3.4  (first  type)  and  Table  3.5  (second  type)  with  different 
choices  of  bit  rates  and  block  lengths  in  each  row. 

Results  show  that  the  Bayesian  scheme  works  much  better  in  the  simulated  cases. 
From  (3.30),  we  can  see  that  the  adaptation  in  the  Cummiskey’s  scheme  depends  only 
on  one  previous  observation  as  well  as  the  current  prediction  error,  so  the  process  can 
be  very  unstable.  An  observation  near  zero  will  cause  a big  jerk  even  the  prediction  is 
not  so  bad.  Specially,  if  the  jerk  leads  to  the  wrong  direction,  it  is  very  difficult  to  pull 
it  back.  Another  weakness  of  the  Cummiskey’s  adaptation  is  its  inability  to  handle 
the  outlier.  The  increment  in  the  adaptation  is  proportional  to  the  prediction  error, 
so  any  bad  prediction  will  make  the  further  adaptation  even  worse.  Furthermore, 
the  parameter  adaptation  based  on  the  current  prediction  error  may  not  work  well 
to  predict  the  next  observation.  In  the  Bayesian  adaptive  scheme,  the  prediction  is 
made  by  using  the  data  of  the  whole  block,  therefore,  it  is  not  so  sensitive  to  outliers. 
The  cost  of  a few  bits  to  quantize  the  parameter  gets  big  payoff  in  reducing  the 
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prediction  errors.  This  makes  the  Bayesian  scheme  superior  in  dealing  with  highly 
nonstationary  signals. 


Table  3.1.  Continued. 
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Table  3.1.  Continued. 
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Table  3.1.  Continued. 
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Table  3.1.  Continued. 
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Table  3.2.  Mean  squared  quantization  errors  from  asymptotic  approximation  and 
simulation.  7r(</>)  = (1  — 8)~2  2<^»,  8 = 10~4. 
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For  notation,  see  Table  3.1. 


Table  3.2.  Continued. 
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Table  3.2.  Continued. 
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Table  3.2.  Continued. 
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Table  3.2.  Continued. 
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Table  3.3.  Optimal  bit  number  (quantization  level)  for  the  parameter  vs  length  of 
the  time  series. 


n 

s/n 

m 

= 3 
v <t> 

s/n 

m 

= 4 
v 4> 

s/n 

m 

= 5 
v* 

10 

1 

(2) 

1 

(2) 

1 

(2) 

20 

2 

(4) 

2 

(3) 

2 

(3) 

30 

2 

(4) 

2 

(4) 

2 

(4) 

40 

2 

(5) 

2 

(5) 

2 

(5) 

50 

3 

(6) 

3 

(6) 

3 

(6) 

60 

3 

(6) 

3 

(6) 

3 

(6) 

70 

3 

(7) 

3 

(7) 

3 

(7) 

80 

3 

(7) 

3 

(7) 

3 

(7) 

90 

3 

(8) 

3 

(7) 

3 

(7) 

10 

3 

(8) 

3 

(8) 

3 

(8) 

Symbols  s,n  and  m are  defined  in  Table  3.1,  = quantization  level  for  (f>. 
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Table  3.4.  Comparison  of  the  mean  square  quantization  errors  from  different  schemes 
where  the  series  consists  several  blocks  with  length  l ~ Poisson(A),  and  parameter 

U(0,1  - 105),  8 = 10"4. 


m 

l 

DPCM 

Cummiskey’s 

Bayesian 

30 

0.06871 

0.05920 

0.03702 

3 

40 

0.07017 

0.06658 

0.03786 

50 

0.07135 

0.06567 

0.03852 

30 

0.01633 

0.01841 

0.01019 

A =40  4 

40 

0.01603 

0.01642 

0.01044 

50 

0.01571 

0.01335 

0.01052 

30 

0.00337 

0.00637 

0.00266 

5 

40 

0.00352 

0.00669 

0.00251 

50 

0.00367 

0.00476 

0.00248 

70 

0.07395 

0.05110 

0.03660 

3 

80 

0.07422 

0.05122 

0.03695 

90 

0.07369 

0.05004 

0.03658 

70 

0.01446 

0.01343 

0.00936 

A = 80  4 

80 

0.01439 

0.01344 

0.00954 

90 

0.01390 

0.01319 

0.00941 

70 

0.00372 

0.00430 

0.00233 

5 

80 

0.00378 

0.00422 

0.00239 

90 

0.00378 

0.00485 

0.00235 
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Table  3.5.  Comparison  of  the  mean  square  quantization  errors  where  the  series  was 
generated  by  using  initial  parameter  <f>  ~ U(0, 1 — 8),  6 = 10-4,  and  a random  error 
A <f>  ~ U(— 0.1, 0.1)  was  added  after  each  observation*. 


m l DPCM  Cummiskey’s  Bayesian 


30 

40 

3 50 
80 
100 

30 

40 

4 50 
80 
100 

30 

40 

5 50 
80 
100 


0.05385 

0.05390 

0.05421 

0.05485 

0.05530 

0.01363 

0.01341 

0.01301 

0.01242 

0.01214 

0.00308 

0.00316 

0.00320 

0.00329 

0.00333 


0.05603 

0.06658 

0.06112 

0.04676 

0.04675 

0.01570 

0.01471 

0.01301 

0.01234 

0.01244 

0.00548 

0.00520 

0.00430 

0.00387 

0.00407 


0.03039 

0.03081 

0.03110 

0.03175 

0.03175 

0.00831 

0.00843 

0.00837 

0.00844 

0.00849 

0.00225 

0.00223 

0.00222 

0.00222 

0.00224 


* The  new  parameter  <f>  -f  A<j)  is  truncated  by  the  upper  and  lower  bounds  0 and 
1 — 6 if  it  is  out  of  range. 


CHAPTER  4 


BAYESIAN  ADAPTIVE  DPCM  FOR  THE  TWO  DIMENSIONAL  AR(1,1)  MODEL 


4.1  Introduction 


Two  dimensional  signal  transmission,  such  as  television  broadcasting,  facsimile 
transmission,  tele-conference  service  and  etc.,  has  assumed  a special  role  with  in- 
creasing importance  in  our  everyday  life.  So  the  research  in  this  field  has  great 
practical  significance.  Quantization  is  an  important  step  in  two  dimensional  signal 
transmission  and  may  affect  the  picture  quality  to  a certain  extent.  Adaptive  skill 
is  also  needed  to  deal  with  the  nonstationary  nature.  The  Bayesian  adaptive  DPCM 
scheme  has  been  proved  by  simulation  results  to  be  a stable  method  in  one  dimen- 
sional signal  quantization.  It  is  desirable  to  extend  the  result  to  the  two  dimensional 
case. 

A two  dimensional  AR(1,1)  model  is  adopted  to  describe  the  data.  The  sample 
drawn  from  a picture  is  taken  as  a two  dimensional  sequence  starting  at  its  lower- 
left  corner.  The  statistical  property  of  the  current  observation  is  determined  by  the 
observed  values  one  lag  away  in  its  neighborhood. 

The  derivation  of  the  scheme  and  the  optimal  bit  allocation  is  similar  to  the 
procedure  in  the  one  dimensional  case.  But  due  to  the  special  characteristic  of  the 
two  dimensional  model,  some  adjustment  is  necessary  for  the  extension. 
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A special  AR(1,1)  model  adopted  in  our  approach  is  introduced  in  4.2  and  some 
related  features  are  discussed.  In  4.3,  the  Bayesian  adaptive  DPCM  scheme  is  ex- 
tended to  the  two  dimensional  case.  The  simulation  result  is  given  in  4.4. 


4.2  Two  Dimensional  AR(l.l)  Model 


The  Bayesian  adaptive  DPCM  can  be  extended  to  the  two  dimensional  case  for 
the  stationary  AR(1,1)  model  defined  as 

Xij  = <J>lXi-ij  + <f>2Xi,j-\  — + tij,  (4.1) 

where  < 1,  |^2|  < 1 and  {e^}  are  i.i.d.  normally  distributed  with  mean  zero  and 
variance  <j\. 

Model  (4.1)  has  some  nice  properties,  one  of  which  is  stated  in  the  next  theorem. 
Theorem  A. 2.1 

Let  {x^,  ±t  = 0, 1,2, . . . , ±j  = 0, 1, 2, . . .}  be  a sample  from  model  ( 4-1 )■  Then, 


xij  — x\-m,j  4>2X',j~n  (f>2Xi-m,j-n  + 


(4.2) 


where 

m— 1 n— 1 

e'a  = 5Z  5Z  tWiti-Ki-i- 

k= 0 /=0 


Proof  of  Theorem  1.2.1 

This  result  can  be  proved  by  induction.  Assume  (4-2)  holds  for  any  integers  m 
and  n.  We  will  prove  it  holds  for  m + 1 and  n + 1. 

The  observations  and  a -i,j-n  can  be  iteratively  expressed  by  the  previous 

observations  using  model  (4-1)-  We  will  see  that  two  terms  get  canceled  at  each  step 
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of  the  iteration. 

xi-m,j 

= <f>lxi-m-l,j  + <t>2xi-m,j-\  - <j>\(t>2xi-m-\,j-\  + ei-m,j 

— ^1®»— m— l,j  4"  $ 2xi-m,j-2  $1  4~  "I- 

n— 1 

= 4"  <t>2Xi-m,j—n  ~ <t>l<f>2xi-m-l,j-n  4"  (^-3) 

1=0 


Similarly, 


m— 1 


•Ei,j— n — <t>2xi,j-n-\  4"  ® i-m,j-n  fl  4>2xi-m,j-n-\  4"  fl^i-kj-n-  (4-4) 


k=0 


Substituting  (4-3)  and  (4-4)  into  (4-®)>  we  have 
xij  = 4>\  xi—m,j  + 4*2  X*,j—n  ~ $ 1 ^2  xi—m,j—n  4"  Cjj 

= <^7,+1a:i-m-l1i  4-  <f>?  <t>2xi-mj-n  ~ <t>?+ 1 f>2  xi-m-\,j-n  + ^2  <f>?  <t>2ei-m,j-l 


n— 1 


1=0 

m— 1 


+ <f>2+lxij-n-l  4-  <f>? <t>2xi-m,j-n  ~ <f>™ <f>2+1  xi-m,j-n- 1 4"  EZ  ^1^2  ^i-k.j-n 

k= 0 

m— 1 n— 1 

+ E E 'PA‘i-kj-1 

k= 0 /=0 

m n 

= - C+V^i-rn-lJ-n-l  +EE 


fc=0  /=0 


Theorem  4.2.1  shows  that  the  AR(1,1)  model  (4.1)  holds  no  matter  what  lags  m 
and  n are  used.  Model  (4.1)  is  a special  case  of  (4.2)  with  m = n = 1. 

For  the  one  dimensional  AR(1)  model,  an  observation  can  be  written  as  an  infinite 
series  of  the  previous  noise.  The  variance  and  autocovariance  can  be  easily  computed 
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from  the  series  expression.  This  result  can  be  extended  to  the  two  dimensional  model 
(4.1). 

Lemma  4. 2.1 

Assume  { Xij , ±i  = 0, 1, 2, . . . , ±j  = 0, 1,2, . . .}  are  from  the  model  (4-1).  Then, 

= («) 

fc=0  1=0 


Proof  of  Lemma  4.2.1 

By  iterating  model  (4-1),  we  obtain 

xij  = 4>\xi-\,j  + 4>2xi,j-\  — <f>l<f>2xi-l,j-l  + €ij 

= ^1^*— 2,i  + f>l<f>2xi-l,j-l  + <t>2xi,j-2  — <t>\<t>2xi—2,j—\  ~ <t>\4>\xi-\,j-2 

+ + <f>2^i,j-l  + C-ij 


~ X tlfai-kj-l  ~ X 4>k\<t>l2Xi-k,j-l  + X ^1^2  li-Kj-l, 

*^nl  Sn  2 Tn 

where  Tn,  Sn i and  Sn 2 are  sets  of  pairs  of  nonnegative  integers  and  they  are,  with 
fixed  integer  n,  defined  by 

Tn  = {(&,  0 : k > 0,  l > 0,  k + l < n} 

- {( k , l)  : k > 0,  / > 0,  k + l = n} 

Sn 2 = {(&,  /)  : k > 0,  / > 0,  k + l = n + 1}. 

Note  that  the  sets  Sn\  and  Sn 2 contain  n + 1 and  n elements,  respectively . It  is 
easy  to  check  that 


£ Itfrfil 

Sn  1 


= £i^rfci 

5nl 


< £[max(|^1|l|&|)]r 

Snl 


= nmax(|^i|n,  |^2|n) 


0. 


as  n —*  oo  since  |<^i|n  < 1 and  |<^>2 |n  < 1.  Similarly,  Y^sn2  ^1^2 


By  the  inequality 


( ai  + a2  + . . . + an)2  < 2 (a2  + aj  + . . . + a2), 


for  any  real  a\,  a2, . . . , an,  we  have 


E 


Xij  £ (f>2e*-k,j-l 


= E 


]£  <f>2X*-k,j-l  4>2Xi-k,j-l 

il  S„2 


< 2£ 


£ <f>l  k4?xi-k*-i  + £ 4>\k4>2x]-k,3-i 

,^nl  5„2 
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< 2 cl 


Sn 1 $n2 


0, 

as  n — ► oo  by  the  previous  result. 

Hence,  ^1^2 e*-k,j-i  converges  to  in  mean  square  by  the  Cauchy  criterion.  We 

conclude  that 

xa  = S 5Z  titffr-hj-i- 

k= 0 1=0 

The  next  theorem  gives  the  variance  and  autocovariance  of  {x^j}  by  applying  its 
series  expansion. 

Theorem  f. 2.2 

Assume  {x^,  ±z  = 0, 1, 2, . . . , ±j  = 0, 1, 2, . . .}  are  from  the  model  (4-1).  Then, 


ax  = V ar(xij)  = 


(1  - )(1  - ft) 


= E(xijxi+Pij+q)  = 


(4.6) 


Proof  of  Theorem  i.2.2 
By  lemma  f. 2.1, 


cl  = E 


(OOOO  N 

\k=0  1=0  ) 


= EE 


k= 0 1=0 
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(1  - )( 1 - <f>l) 


since  the  e+j ’s  are  all  independent. 
For  positive  p and  q, 

li.Pt  <?)  = E{xijxi-P,j~q) 


= E 


EEttwi 


L \k=o l=o 


\k=0  1=0 


= E 


E E ( E E 

Vfc=0 /=0  / \k=p l=q  y 


= 


2 

e 


fc=p  i=g 


= <£rp<£'9- 


1 ^ (i-^)(i-^)‘ 


= $>2^x‘ 


So 

7(p»?)  = riPl49lcrx 

6y  symmetry. 

Multiplying  Xj_ij  and  x.j-i  respectively  to  both  sides  of  (4.1)  and  then  taking 
the  expectation,  we  get  the  Yule- Walker  equation 


7(1,0)  = <£i7(0,0)  + ^27(1, 1)  ~ ^i<^27(0,  1) 
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7(0,1)  = <^i7(l>l)  + ^(M)  “ </>i ^27(1,0).  (4.7) 

for  this  two  dimensional  AR(1,1)  model.  Estimates  of  the  parameters  fa  and  fa  can 
be  obtained  by  substituting  the  sample  variance  and  autocovariance  into  (4.7)  and 
solving  it  with  respect  to  <j)  1 and  fa. 

The  least  square  predictor  for  Xij  based  on  the  previous  observations  is 

xij  = ^l®»— l,j  d"  fa%i,j—\  i,j— 1 (4*8) 

with  mean  squared  prediction  error  a 


4.3  Two  Dimensional  Bayesian  Adaptive  DPCM 


The  derivation  of  the  Bayesian  adaptive  DPCM  for  the  two  dimensional  AR(1,1) 
model  is  similar  to  that  of  the  one  dimensional  AR(1)  model.  So  similar  notation 
is  defined  and  used  without  further  explanation.  Some  results  can  be  derived  in  a 
similar  way  and  they  are  just  stated  without  proof.  Due  to  the  difference  between 
the  two  models,  the  derivation  is  not  just  a simple  repetition  of  the  same  procedure. 
Detailed  proof  is  given  in  case  it  is  needed. 

Let  = 1,2, .. . ,ni,j  = 1,2, .. . ,n2}  be  a sample  from  model  (4.1).  Consid- 

ering that  the  sample  we  are  dealing  with  is  finite,  we  rewrite  the  truncated  version 
of  model  (4.1)  in  vector  form 


x^  ~ AT(0,  (7j),  if  i = 1 or  j = 1 

< 

x^  = 3>'Xy  + eij,  otherwise, 


(4.9) 


where  = (fa,  fa,  -fa  fa)  and  Xjj'  = (s,_i,j,  1,  X;_ ij_i).  We  assume  fa,  fa 

and  the  white  noise  are  all  independent  and  assign  a joint  prior  density  function 
vk(fa,fa))  to  fa  and  fa. 
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Define  = Xij,  for  i = 1 or  j = 1.  The  prediction  residual  is  given  by 


x 


€ij  — < 


if  i = 1 or  j = 1 
Xij  — $'Xjj,  otherwise. 


(4.10) 


Since  only  quantized  values  {e^},  <j>i  and  <j>2  are  available  on  the  receiver  side,  the 


reconstructed  value  for  xtJ  is 


+ eij.  (4.11) 

It  is  the  direct  result  of  equations  (4.10)  and  (4.11)  that  the  reconstruction  error 
is  equal  to  the  quantization  error  of  the  residual  x^,  i.e. 

eij  - = x^  - iij.  (4.12) 

Just  as  in  the  one  dimensional  case,  the  prediction  residuals  {e.y}  tend  to  the 
white  noise  in  mean  square  as  the  quantization  levels  tend  to  infinity  and  they  are 
asymptotically  independent. 

We  can  see  that  model  (4.9)  has  a similar  form  to  the  one  dimensional  AR(p) 
model  (3.2).  But  there  are  indeed  some  differences.  One  important  feature  of  the 
two  dimensional  model  is  that  the  vector  $ has  three  components  but  only  two  of 
them  can  be  determined  independently.  We  need  the  next  lemma  to  show  the  the 
quantized  vector  $ is  orthogonal  to  the  quantization  error 

Lemma  1.3.1 

Assume  <j>-y  and  <f>2  are  the  parameters  for  model  (4-9)  and  (j>i  and  <j>2  are  the  L-M 
quantizer.  Then, 


E[(<t>i<f>2  — fi4>2)4>i4>2]  = 0. 


(4.13) 
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Proof  of  Lemma  1.3.1 


E[(fafa  — fa  fa)fa  ^2] 

= E[(fafa  — fafa  + fafa  — fafa)  fa  fa] 

A AA  A A A A A 

= i£[(^l^2  — fa  fa)fa  fa]  + E[(fafa  — ^1^2)  ^1^2] 

= -E[(^l  — <^l)^1^2^2]  + 23  [(^2  — <^2)^1  ^2] 

= 0 


by  Lemma  3.2.2. 

The  next  theorem  proves  the  asymptotic  independence  between  Xy  and  (Xy  — Xy) 
for  fixed  (j>\  and  fa. 


Theorem  4-3.1 

Let  Xy  be  defined  as  in  model  (4-9)  and  Xy  is  the  L-M  quantizer  for  Xy . Then, 


£[xu(xy-Xy)'|«>]  ->  0, 


as  the  quantization  levels  tend  to  infinity. 


(4.14) 


Proof  of  Theorem  1.3.1 


( (x*-i ,j  ~ xi-i,j)xi-i,j 

(xi,j-i  ~ xi,j-i)xi,j-i 
' (xi-l,j-l  — )£«-!,> 


(xi-l,j  ~ Xi-lj)XiJ-l 


( xi-l,j  ~ Xi-\,j)xi-\,j-\  N 
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All  the  components  of  the  matrix  have  the  form  of  [x{j  — Xij)xki,  and  they  may 
be  divided  into  three  groups.  We  now  prove  that,  in  each  group,  the  conditional 
expectations  tend  to  zero. 

If  i = k and  j = l,  the  conditional  expectation  vanishes  by  the  property  of  the 
L-M  quantizer. 

If  i = k + 1 or  j = l + 1, 

(Xij  — Xij)  = (eij  — &ij)  ( eij  ~ ^»i)» 

which  is  independent  of  Xki-  So  the  result  holds. 

If  i = k — 1 or  j = l — 1,  the  Xki  can  be  written  as  a linear  combination  of 
the  prediction  errors  and  {ipq,  p < i or  q < j }.  All  terms  in  the  combination  are 
asymptotically  independent  of(xij-Xij)  except  the  one  containing  e ij.  The  conditional 
expectation  of  this  term  vanishes  by  the  result  in  case  of  i = k and  j = l. 

This  completes  the  proof. 

We  rewrite  el]  as 


e*i  ~ 


xn  - S'Xjj 


= €*  + (#  - i)'Xy  + ■f.'fXy  - Xy). 

by  Theorem  4.3.1  and  the  fact  that  {e^}  is  independent  of  the  parameters  and  the 
previous  observations.  Then, 
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= cl  + B [(#  - *)'XuX|j(*  -*)]+£  [*'(Xy  - xu)(Xy  - Xu)'*] 
= aJ  + E [(*  - *)'xux,j'(*  -*)]+£  [*'(Xu  - Xu)(Xu  - Xu)'* 


by  Lemma  4.3.1  and  Theorem  4.3.1. 

Suppose  that  we  have  totally  s bits  available  and  rrik  bits  will  be  assigned  to  the 
parameter  <f>k-  So  the  quantization  level  for  (f>k  is  v(pk  = 2mk . For  the  b bits  left  after 
assigning  m bits  to  the  parameters,  let  b\  and  b2  be  the  numbers  of  bits  assigned  to  the 
observations  X{j  with  index  i = 1 or  j = 1 and  the  prediction  residuals  respectively. 
Then,  by  Huang  and  Schultheiss’  optimal  bit  allocation,  b\  and  b2  can  be  calculated 
by 


5 1 ntna-^-na  + l 

h = + o lo§2  [^(To)] 


nin2  2 


b 1 ni 

b2  = + - log2  [£(70)]  n‘"2 

Tl\Tl2  Z 


(4.15) 


where  70  = cr^/a and  the  corresponding  quantization  levels  are 


b n1n2-n1  -n2  + 1 

ux  = 2bl  = 2nin2  [£(70)]  2ni"2 


u2  = 2*  = 2^  [E  (l0)]~n-^  , 


(4.16) 


respectively. 

We  now  write  a\  as  a function  of  the  quantization  levels  and  v2  by  using 

the  next  two  lemmas. 
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Lemma  1.3.2 

Suppose  {xij,i  = 1,2,  ...jTiijj  = 1,2,.  ..,n2}  is  a sample  from  the  model  (4-9). 
Then, 

E [(#  - #)'XyXy'(*  - *)]  2=  a]  Y.  % (4.17) 

fc=i  V4>„ 


where 


Gl~V2 


[ *k(<l>k)d<f), 

.J  — oo 


-OO  (1  - <f>l) 


TTfc  {4>k)d«fk 


Proof  of  Lemma  1.3.2 

Let  P be  the  correlation  matrix  o/Xy.  Expanding  the  product  (3?  — $)'P($  — 4»), 
we  have 


($  _ $)'P($  - $) 


/ (k  - k)  \ 

/ 

( 1 

f 1 k 

4>  2 ^ 

( (k  - k)  \ 

(k  — k)  a 

kk 

1 

k 

(k  - k ) a 

\ ~{kk  — kk) ) 

\ k 

k 

1/ 

\ ~{kk  — kk)  / 

— (k  — k)2  + (<f>  2 — fa)2  — <f>l<t>l  + 


» « A A A A 

+2<f>i<fi(j)2((/>2  ~ fl)  + %kkk(k  ~ 


So 


E [(#  - #)'XyXy'(<f  - *)] 
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= a2E 


(<^1  ~ <ftl)2  + (^2  ~ faY  ~ + ^1^2 

(1  -£)(!-#) 


+2a2E 


A A A A A A 

<f>lfil(f>2{(f)2  ~ ^2)  + $2 <^1  ^2(^1  ~ ^l) 
(1  - )(1  - £) 


The  second  term  tends  to  zero  by  Lemma  3.5.2.  We  can  consider  only  the  first  term. 


Note  that 


E 


(<£*  - k)2 

.i-H\ 

) 


for  k = 1,2  when  v ^ and  are  large.  By  adding  and  subtracting  in  the 

numerator, 


E 


(<f>  1 — <t>\)2  + (<f>2  ~ fa)2  ~ 


(1  - )(  1 - fil) 


= E 


'(*  1 - k)2  + {<h  - k)2  - (<J>1  - k)<t>l  - kM\  - k)' 
(i  - - tl) 


(<i>i  - ^i)2(i  - <&)  + (fa  - <^2)2(i  - fay 

(1  - 1 - #) 


E 


\4n  - ^i)2' 

. (1  - ) . 


+ £ 


(<^2  — ^2) 2 

. (1  - ) . 


„,2 


Lemma  3.5.1. 
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Lemma  4.9.3 

Suppose  { Xij,i  = 1,2,  = 1,2,  ...,n2}  is  a sample  from  model  (4-9). 

Then, 


E 


*'(Xu  - Xy)(xu  - xu)'i 


= E 


.fc=l  L 


m)-£ 

v<t>k 


+ n 


fc=i  L 


EM)  - £ 

V4>k 


2„  — 2 


I°'V  2 


Proof  of  Lemma  4.3.3 

Note  that  the  reconstruction  error  is  equal  to  the  quantization  error  which  is  in- 
dependent of  the  parameters.  The  mean  of  the  cross  product  terms  tend  to  zero  as 
the  quantization  levels  go  to  infinity.  So 

E [*'(Xy  - Xy)(Xy  - Xy)'4 

= E [<^0,-1,,  - ii-i,;)2  + - i.j-i  f + HiH n-ij-i  - i-ij-i)2 

= + E(i\)  + E(i\)E(i\)] 


= E 


Ufc=l 


m)  - £ 

v*k  J 


+ n 


fc=l 


m)  - £ 

v*h. 


From  Lemma  4.3.2  and  Lemma  4.3.3,  we  have 


a]  = C(v4>l,v<h,v1,v2)al, 


(4.18) 


where 
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C(v<j>1,v<h,v1,v2)  = 


1 + ELi  Gkv;; 

i - {EL,  [£(«  - Dtv;;}  + nL,  [£(«)  - Dhv;?]}lv 


(4-19) 


So  by  Lemma  3.2.1,  the  mean  squared  quantization  error  has  the  expression 


E[{e*  - exjf]  = a] 


i + n= i ohvi 


-2 


vp-'  - EL,  [B(«  - Dtv;l\  - nL,  [£«)  - Dkv; 


-2 


(4.20) 


By  Theorem  3.4.1,  the  total  mean  square  quantization  error  is 


ni  nj 

«E I 


t=i  j=i 


= nin2cre 


1 + EL,  

W-'  - EL,  [mi)  - d„v;1]  - nL,  [e(<H)  - d„v 


= nxn2cr\H , 


say. 

The  best  choice  of  {v^}  can  be  obtained  by  differentiating  H with  respect  to 
{u^}  and  then  setting  the  derivative  to  be  zero.  Note  that 

v+u  = 2m\ 


2 

k=l 

1)2  = 2 nl  n2 


[E(7q)] 
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- 


nl+n2~l 


= v^”int2  nin2  [E(7q)]  2ni”2 


So 


dv\ 


dvl 


k n^vk 


The  optimal  {u^}  will  be  the  solution  for 


dU  dW 

—~W-U—  = 0, 

dv*h  dv<t>k 


(4.21) 


for  k = 1,2  subject  to  ^ = [£(70)] 2 and  m + b = s,  where 


v = i + £g*«* 


-2 


k-l 


W = - E [Wl)  - - n [BUD  - Dtv;i 


k-l 


k= 1 


4.4  Simulation 

The  approximation  for  the  mean  square  quantization  error  using  (4.9)  and  the 
simulation  result  are  listed  in  Table  4.1.  In  the  simulation,  the  parameters  <f>x  and 
fa  were  generated  uniformly  on  the  interval  (0,1  - S\  with  8 = 10-2.  Data  sets 
with  rii  = n2  = 16,32,48,  and  64  were  generated  and  the  white  noise  is  normally 
distributed  with  zero  mean  and  unit  variance.  Since  nx  = n2  and  <t>x  and  <f>2  have 
same  distribution,  we  assign  the  same  number  of  bits  for  quantizing  (j>x  and  <f>2.  The 
simulation  was  run  1000  times  with  average  bits  s/(nxn2 ) = 3,4,  and  5. 
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We  can  see  that  formula  (4.20)  gives  good  approximation  for  the  quantization 
error  even  if  the  average  bit  number  for  each  observation  is  not  so  large.  The  differ- 
ence between  the  approximation  and  the  simulation  gets  smaller  as  the  bit  number 
increases.  The  optimal  number  of  bits  assigned  to  the  parameters  determined  by 
(4.20)  coincides  with  or  is  very  close  to  that  from  simulation.  So  the  approximation 
formula  can  be  used  to  determine  the  bit  allocation. 
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total  number  of  bits  available,  mk  = number  of  bits  assigned  to  <j>*.  n„  n2  = length  and  width  of  block. 


Table  4.1.  Continued. 
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Table  4.1.  Continued.  n1 
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